In the early hours of Friday, Beijing time, a new chapter in the AI era unfolded as OpenAI introduced a transformative large-scale model, capable of executing intricate general reasoning.
OpenAI announced the release of the OpenAI o1-preview model, colloquially known as the "Strawberry" model, on their official website. They described the model as representing a significant advancement in AI capabilities for complex reasoning tasks, worthy of a new designation distinct from the "GPT-4" series.
The distinguishing feature of the o1 model lies in its approach to problem-solving. Unlike previous models that relied on data pattern recognition to generate responses without true comprehension, the o1 series dedicates more time to thought processes before delivering answers, akin to human reasoning.
Initially, OpenAI has rolled out two versions of the o1 series: the o1-preview and the o1-mini. Access to these models is being phased in, starting with paid users, followed by free users and developers, with developers facing high usage costs.
According to OpenAI, the new training methodologies behind the o1 model facilitate it in addressing more complex programming, mathematical, and scientific challenges with pre-answer reflections, thus outperforming human capabilities in speed. The more compact and cost-effective mini version is tailored to programming applications.
ChatGPT Plus and Team subscribers can immediately access these models through the AI model selector interface, while Enterprise and Edu users will gain access next week. Eventually, the o1-mini model will be made available to all free users, although the timeline remains unspecified.
However, developer access to the o1 model is notably expensive. The API pricing is set at $15 per million input tokens and $60 per million output tokens, making it thrice as costly as GPT-4o.
Jerry Tworek, OpenAI’s head of research, noted that the o1 model employs a fundamentally different training approach, using a novel optimization algorithm and a specialized dataset that includes reasoning data and scientific literature. Unlike the pattern imitation of earlier GPT models, the o1 uses reinforcement learning, rewarding or penalizing itself while solving problems, and employs a "chain of thoughts" to summarize and respond to queries step-by-step.
OpenAI asserts that this new methodology enhances the accuracy of the o1 model, although it does not entirely eliminate the issue of hallucinations—erroneous or fabricated responses. The significant differentiation from GPT-4o lies in the o1's improved ability to tackle complex problems in programming and mathematics, refining its reasoning process, testing various strategies, and identifying and correcting its mistakes.
Preliminary evaluations have illustrated the o1 model's potential, scoring 83% in the International Mathematical Olympiad qualification exam compared to GPT-4o’s 13%. Additionally, it achieved the 89th percentile in the Codeforces programming competition versus GPT-4o's 11th percentile. OpenAI forecasts that future updates could enable the model to perform on par with doctoral students in challenging benchmarks across physics, chemistry, and biology.
Despite these advancements, the o1-preview model exhibits limitations. It currently functions as a text-only model, lacks internet browsing, and cannot upload files or images. Furthermore, it falls short in many common use cases where GPT-4o remains superior and is limited to 30 messages per week for the preview version and 50 for the mini version.
Nonetheless, OpenAI envisions the o1 series significantly enhancing various specialized fields. For example, healthcare researchers could utilize o1 for annotating cell sequencing data, physicists for generating complex quantum optics equations, and developers across disciplines for multi-step workflows.