Symbols

OpenAI's o3 Models Set New Benchmark in AI Performance, Outshining Human Expertise

Sunday, Dec 22, 2024 1:00 pm ET1min read

OpenAI recently unveiled its advanced models, o3 and o3-mini, designed to push the boundaries of artificial intelligence performance. The move skips the name o2 to avoid potential trademark issues with the UK telecom operator O2.

Currently, these models are not widely accessible to the public, with OpenAI initially granting testing access to security researchers. A broader release is expected in January of next year, starting with o3-mini and quickly followed by o3.

The o3 series of models demonstrate significantly superior capabilities compared to previous iterations. In the SWE-Bench Verified coding accuracy test, o3 achieved a remarkable 71.7%, a 22.8 percentage point increase from o1's 48.9%. Its score of 2727 on the Competition Code test surpasses both o1 and OpenAI's Chief Scientist's marks.

The math and science performance is also promising, with the o3 model scoring 96.7% in the 2024 AIME American Mathematics Invitational and an 87.7% in graduate-level science problem sets, outperforming human experts.

A new benchmark, the Frontier Math test from EpochAI, highlighted o3's ability to solve complex mathematical challenges. It successfully addressed 25.2% of these problems, with alternative models unable to surpass a 2% solve rate.

In terms of applied reasoning, the ARC-AGI test measured o3's extensive computational prowess. Under high-performance settings, o3 scored 87.5%, exceeding the human average of 85%, and under more limiting conditions, it achieved 75.7%, three times the performance level of o1.

With its introduction of "adaptive thinking time," the newly launched o3-mini allows users to select from various computation levels to balance performance needs effectively.

OpenAI's latest offerings signal a significant stride towards advanced AI capabilities but remain not fully available for widespread implementation. While optimism surrounds their potential impact, strategic deployment and accessibility will likely shape their eventual integration across industries.

Word on the Street

Stay ahead with real-time Wall Street scoops.

Latest Articles

Stay ahead of the market.

Get curated U.S. market news, insights and key dates delivered to your inbox.

Comments

﻿

Add a public comment...

No comments yet

AInvest
PRO

Editorial Disclosure & AI Transparency: Ainvest News utilizes advanced Large Language Model (LLM) technology to synthesize and analyze real-time market data. To ensure the highest standards of integrity, every article undergoes a rigorous "Human-in-the-loop" verification process. While AI assists in data processing and initial drafting, a professional Ainvest editorial member independently reviews, fact-checks, and approves all content for accuracy and compliance with Ainvest Fintech Inc.’s editorial standards. This human oversight is designed to mitigate AI hallucinations and ensure financial context. Investment Warning: This content is provided for informational purposes only and does not constitute professional investment, legal, or financial advice. Markets involve inherent risks. Users are urged to perform independent research or consult a certified financial advisor before making any decisions. Ainvest Fintech Inc. disclaims all liability for actions taken based on this information. Found an error?Report an Issue