icon
icon
icon
icon
Upgrade
Upgrade

News /

Articles /

OpenAI's o3 Models Set New Benchmark in AI Performance, Outshining Human Expertise

Word on the StreetSunday, Dec 22, 2024 1:00 pm ET
1min read

OpenAI recently unveiled its advanced models, o3 and o3-mini, designed to push the boundaries of artificial intelligence performance. The move skips the name o2 to avoid potential trademark issues with the UK telecom operator O2.

Currently, these models are not widely accessible to the public, with OpenAI initially granting testing access to security researchers. A broader release is expected in January of next year, starting with o3-mini and quickly followed by o3.

The o3 series of models demonstrate significantly superior capabilities compared to previous iterations. In the SWE-Bench Verified coding accuracy test, o3 achieved a remarkable 71.7%, a 22.8 percentage point increase from o1's 48.9%. Its score of 2727 on the Competition Code test surpasses both o1 and OpenAI's Chief Scientist's marks.

The math and science performance is also promising, with the o3 model scoring 96.7% in the 2024 AIME American Mathematics Invitational and an 87.7% in graduate-level science problem sets, outperforming human experts.

A new benchmark, the Frontier Math test from EpochAI, highlighted o3's ability to solve complex mathematical challenges. It successfully addressed 25.2% of these problems, with alternative models unable to surpass a 2% solve rate.

In terms of applied reasoning, the ARC-AGI test measured o3's extensive computational prowess. Under high-performance settings, o3 scored 87.5%, exceeding the human average of 85%, and under more limiting conditions, it achieved 75.7%, three times the performance level of o1.

With its introduction of "adaptive thinking time," the newly launched o3-mini allows users to select from various computation levels to balance performance needs effectively.

OpenAI's latest offerings signal a significant stride towards advanced AI capabilities but remain not fully available for widespread implementation. While optimism surrounds their potential impact, strategic deployment and accessibility will likely shape their eventual integration across industries.

Disclaimer: the above is a summary showing certain market information. AInvest is not responsible for any data errors, omissions or other information that may be displayed incorrectly as the data is derived from a third party source. Communications displaying market prices, data and other information available in this post are meant for informational purposes only and are not intended as an offer or solicitation for the purchase or sale of any security. Please do your own research when investing. All investments involve risk and the past performance of a security, or financial product does not guarantee future results or returns. Keep in mind that while diversification may help spread risk, it does not assure a profit, or protect against loss in a down market.