icon
icon
icon
icon
🏷️$300 Off
🏷️$300 Off

News /

Articles /

OpenAI's o3 Models Set New Benchmark in AI Performance, Outshining Human Expertise

Word on the StreetSunday, Dec 22, 2024 1:00 pm ET
1min read

OpenAI recently unveiled its advanced models, o3 and o3-mini, designed to push the boundaries of artificial intelligence performance. The move skips the name o2 to avoid potential trademark issues with the UK telecom operator O2.

Currently, these models are not widely accessible to the public, with OpenAI initially granting testing access to security researchers. A broader release is expected in January of next year, starting with o3-mini and quickly followed by o3.

The o3 series of models demonstrate significantly superior capabilities compared to previous iterations. In the SWE-Bench Verified coding accuracy test, o3 achieved a remarkable 71.7%, a 22.8 percentage point increase from o1's 48.9%. Its score of 2727 on the Competition Code test surpasses both o1 and OpenAI's Chief Scientist's marks.

The math and science performance is also promising, with the o3 model scoring 96.7% in the 2024 AIME American Mathematics Invitational and an 87.7% in graduate-level science problem sets, outperforming human experts.

A new benchmark, the Frontier Math test from EpochAI, highlighted o3's ability to solve complex mathematical challenges. It successfully addressed 25.2% of these problems, with alternative models unable to surpass a 2% solve rate.

In terms of applied reasoning, the ARC-AGI test measured o3's extensive computational prowess. Under high-performance settings, o3 scored 87.5%, exceeding the human average of 85%, and under more limiting conditions, it achieved 75.7%, three times the performance level of o1.

With its introduction of "adaptive thinking time," the newly launched o3-mini allows users to select from various computation levels to balance performance needs effectively.

OpenAI's latest offerings signal a significant stride towards advanced AI capabilities but remain not fully available for widespread implementation. While optimism surrounds their potential impact, strategic deployment and accessibility will likely shape their eventual integration across industries.

Comments

Add a public comment...
Post
Disclaimer: The news articles available on this platform are generated in whole or in part by artificial intelligence and may not have been reviewed or fact checked by human editors. While we make reasonable efforts to ensure the quality and accuracy of the content, we make no representations or warranties, express or implied, as to the truthfulness, reliability, completeness, or timeliness of any information provided. It is your sole responsibility to independently verify any facts, statements, or claims prior to acting upon them. Ainvest Fintech Inc expressly disclaims all liability for any loss, damage, or harm arising from the use of or reliance on AI-generated content, including but not limited to direct, indirect, incidental, or consequential damages.
You Can Understand News Better with AI.
Whats the News impact on stock market?
Its impact is
fork
logo
AInvest
Aime Coplilot
Invest Smarter With AI Power.
Open App