GPT-5 Launch Sparks Backlash as Prediction Markets Shift to Google

Generated by AI AgentCoin World
Monday, Aug 11, 2025 4:57 pm ET2min read
Aime RobotAime Summary

- OpenAI launched GPT-5 as its most advanced model, excelling in technical tasks with 94.6% math accuracy but facing criticism for lacking emotional depth in creative writing.

- Mixed user reactions led OpenAI to reintroduce GPT-4o after 3,000+ petition signatures, with Reddit users labeling GPT-5 as "horrible" and "a disaster."

- Prediction markets shifted rapidly, giving Google an 80% odds lead over OpenAI by August 2025 as GPT-5's performance in sensitive topics and information retrieval drew scrutiny.

- Despite strong logical reasoning and coding capabilities, GPT-5 showed inconsistencies in math and struggled with nuanced creative tasks compared to competitors like Claude 4.1 Opus.

OpenAI recently launched GPT-5, positioning it as the "smartest, fastest, and most useful" model yet. The release followed months of speculation and an ill-received promotional video from Sam Altman. The model demonstrated strong performance in technical and logical tasks, scoring 94.6% on math tests and 74.9% on real-world coding tasks [1]. However, it has faced mixed reactions, particularly from creative users who found the model lacking in emotional depth and narrative complexity [1].

Early feedback on GPT-5 was sharply divided. While OpenAI emphasized a "unified architecture" combining fast responses with deep reasoning, many users expressed disappointment, with

users labeling the model as "horrible," "awful," and "a disaster." The backlash was significant enough to prompt OpenAI to reintroduce the GPT-4o model after over 3,000 people signed a petition. Altman announced the change via his personal account on August 10, 2025 [1].

Prediction markets reflected this dissatisfaction. On Polymarket, OpenAI’s odds of having the best AI model by the end of August plummeted from 75% to 12% within hours of GPT-5's launch, with Google overtaking it at 80% [1]. This suggests that market sentiment quickly shifted in favor of competitors, at least in the eyes of early bettors.

In creative writing tests, GPT-5 fell short compared to models like Claude 4.1 Opus. Its output was technically sound but lacked emotional nuance and stylistic richness. For example, a time-travel paradox story generated by GPT-5 was criticized for being emotionally flat and structurally predictable, failing to fully address the prompt. In contrast, Claude provided a more layered narrative with detailed world-building and character development [1].

GPT-5 also demonstrated limitations in handling sensitive or controversial topics. While it refused to generate content deemed inappropriate, users found it easy to bypass these restrictions using jailbreaking techniques. This raised concerns about the model’s safety mechanisms, as it could be manipulated to produce content it otherwise would avoid [1].

In information retrieval tasks, GPT-5 performed poorly. It struggled to accurately extract specific details from long documents, even when attachments were used. For instance, it failed to correctly identify key pieces of information from a 300K token document and was unable to find specific needles in a 85K token test. While some researchers have praised GPT-5 for information retrieval, these findings suggest its performance is inconsistent [1].

Non-mathematical reasoning was one of GPT-5’s strong suits. It excelled at logical deduction, solving a complex murder mystery by mapping out clues and relationships systematically. This was a marked improvement over previous models like GPT-4o, which refused to engage with the same scenario due to its violent nature [1].

However, the model’s math capabilities were inconsistent. While it aced a complex mathematical benchmark from FrontierMath, it made a basic arithmetic error in a simple equation, solving 5.9 = X + 5.11 incorrectly as X = -0.21 instead of 0.79. This discrepancy highlights the model’s uneven skill levels and the potential for dataset contamination in its training [1].

In coding tasks, GPT-5 performed well, producing clean, functional code that was visually appealing and structurally sound. It was particularly effective for straightforward coding projects and outperformed other models like Claude in cost efficiency. However, it struggled with iterative bug fixes, where more nuanced understanding of the user’s intent was required [1].

Overall, GPT-5 is a powerful tool for developers, researchers, and professionals requiring precise logical reasoning and technical capabilities. It is particularly well-suited for coding, mathematical analysis, and structured problem-solving. However, it falls short in areas requiring creativity, emotional intelligence, and nuanced language use. Its limitations in creative writing and information retrieval suggest that users should be selective in their expectations.

OpenAI has indicated that the model will continue to evolve, with updates likely to improve its performance over time. Given its current capabilities and pricing, GPT-5 is a strong option for specific professional applications but may not be the ideal choice for casual or creative users who preferred the more human-like responses of previous models [1].

Source:

[1] OpenAI GPT-5 Review: Built to Win Benchmarks, Not Hearts (https://decrypt.co/334544/openai-gpt-5-review-built-win-benchmarks-not-hearts)

Comments



Add a public comment...
No comments

No comments yet