Symbols

OpenAI's New o1 Model Revolutionizes AI with Enhanced Reasoning, Outshining GPT-4o in Math and Coding Competitions

Thursday, Sep 12, 2024 6:00 pm ET1min read

OpenAI's first large language model with reasoning capabilities, codenamed "Strawberry," has officially been launched under the name "OpenAI o1." Originally anticipated to be released in two weeks, the model became available on September 12. OpenAI unveiled two initial versions: o1-preview and o1-mini. These versions are being rolled out in phases to paying users, free users, and developers, with the latter facing higher usage costs.

The new o1 model is revolutionary as it incorporates a novel training approach enabling it to tackle more complex programming, mathematical, and scientific problems. Unlike previous GPT models that mimic data patterns, o1 employs reinforcement learning to solve problems and subsequently provides summary answers through a "chain of thoughts" process, simulating human problem-solving steps.

OpenAI's researchers, including Jerry Tworek, point out that the o1 model has been trained using a new optimization algorithm and custom datasets rich in "reasoning data" and scientific literature. The model's reinforcement learning framework teaches it to solve problems by rewarding correct answers and penalizing incorrect ones.

Performance-wise, o1 has shown significant improvements. It has outperformed its predecessor, GPT-4o, by scoring 83% accuracy on an International Math Olympiad qualification exam compared to GPT-4o's 13%. Additionally, in the Codeforces coding competition, o1-mini achieved a high rank, outperforming 89% of human competitors.

Despite these advancements, the initial versions of the o1 model come with some limitations. They currently lack features such as web browsing, file uploads, and comprehensive world knowledge. There are also usage caps, with o1-preview limited to 30 messages per week and o1-mini capped at 50.

The o1 model's safety protocols have also been enhanced. In extreme "jailbreak" tests, where models are tested for their ability to resist generating harmful content, o1-preview outscored GPT-4o, suggesting a better adherence to safety guidelines.

OpenAI anticipates that the enhanced reasoning capabilities of the o1 model will be particularly beneficial for tackling complex problems in fields like science and programming. For instance, it can annotate cell sequencing data for medical researchers or generate advanced quantum optics formulas for physicists.

In terms of accessibility, the o1-preview and o1-mini models are now available to ChatGPT Plus and Team users, with Enterprise and Education users expected to gain access next week. OpenAI plans to extend o1-mini to all free users in the future and aims to introduce automatic model selection based on user prompts.

Word on the Street

Stay ahead with real-time Wall Street scoops.

Latest Articles

Stay ahead of the market.

Get curated U.S. market news, insights and key dates delivered to your inbox.

Comments

﻿

Add a public comment...

No comments yet

AInvest
PRO

Editorial Disclosure & AI Transparency: Ainvest News utilizes advanced Large Language Model (LLM) technology to synthesize and analyze real-time market data. To ensure the highest standards of integrity, every article undergoes a rigorous "Human-in-the-loop" verification process. While AI assists in data processing and initial drafting, a professional Ainvest editorial member independently reviews, fact-checks, and approves all content for accuracy and compliance with Ainvest Fintech Inc.’s editorial standards. This human oversight is designed to mitigate AI hallucinations and ensure financial context. Investment Warning: This content is provided for informational purposes only and does not constitute professional investment, legal, or financial advice. Markets involve inherent risks. Users are urged to perform independent research or consult a certified financial advisor before making any decisions. Ainvest Fintech Inc. disclaims all liability for actions taken based on this information. Found an error?Report an Issue