Symbols

OpenAI Rewires AI Learning to Punish Confident Lies, Reward Honesty

Generated by AI AgentCoin World

Monday, Sep 8, 2025 4:26 pm ET2min read

Aime Summary

- OpenAI proposes revised evaluation methods to reduce hallucinations in AI models by penalizing confident errors and rewarding uncertainty acknowledgment.

- Current accuracy-focused training incentivizes models to generate plausible but incorrect answers instead of admitting knowledge gaps, especially for factual queries.

- Experiments show models with lower accuracy rates can produce fewer errors when evaluated with uncertainty-aware scoring systems, challenging traditional metrics.

- The issue stems from pretraining on unlabeled data, making models prone to factual inaccuracies for rare or arbitrary facts despite improved reasoning capabilities.

- OpenAI has implemented these changes in new models, emphasizing the need for evaluation frameworks that prioritize calibration over pure accuracy to mitigate hallucinations.

OpenAI has unveiled a critical solution to address hallucinations in large language models, a persistent issue in the development of AI systems. Hallucinations, which are plausible yet false statements generated by models, remain a significant challenge for developers. In a new research paper, OpenAI asserts that the root of the problem lies in training and evaluation methods that reward guessing over acknowledging uncertainty [1]. The study highlights that current evaluation frameworks often prioritize accuracy, inadvertently encouraging models to generate confident but incorrect responses instead of admitting uncertainty [1].

The problem is particularly evident when models are tasked with answering questions that require precise information, such as dates, percentages, or specific factual claims. For instance, a widely used chatbot was found to provide three incorrect answers when asked about the title of a PhD dissertation by one of the paper’s authors, and similarly produced three wrong dates for the same author’s birthday [1]. The study illustrates that when a model is incentivized to maximize accuracy, it may produce a confident but incorrect answer, such as “September 10” for a person’s birthday, simply because the risk of receiving zero points for not answering outweighs the small chance of being correct [1].

To address this, OpenAI proposes a revised approach to model evaluation, suggesting that scoring systems should penalize confident errors more heavily than uncertain responses and offer partial credit for expressing uncertainty. This is akin to the use of negative marking in standardized tests, where incorrect answers are penalized to discourage blind guessing. OpenAI argues that current accuracy-based scoreboards are not sufficient to mitigate hallucinations and need to be restructured to promote models that can recognize their limitations [1]. This is supported by data from the SimpleQA evaluation, which compared the performance of two models—GPT-5-thinking-mini and OpenAI o4-mini. While the older OpenAI o4-mini model had a higher accuracy rate (24%), it also had a significantly higher error rate (75%) compared to GPT-5-thinking-mini, which had a lower accuracy rate (22%) but a much lower error rate (26%) [1].

The origin of hallucinations is also tied to the pretraining process of language models, where the models learn to predict the next word in a sequence without labeled data to distinguish true from false statements. This leads to a high likelihood of generating factual inaccuracies, particularly for low-frequency or arbitrary facts that cannot be derived from patterns alone [1]. OpenAI emphasizes that while larger models may have improved reasoning capabilities, they are not immune to hallucinations and may even be more prone to them due to the complexity of their internal representations. In contrast, smaller models may have an easier time acknowledging uncertainty when faced with topics outside their knowledge domain [1].

Despite these challenges, OpenAI notes that hallucinations are not inherently inevitable and can be reduced with better training and evaluation methods. The company has already implemented changes in its latest models to lower hallucination rates and is continuing to refine its approaches. The study underscores the importance of shifting the focus from accuracy-centric metrics to ones that reward uncertainty and calibration, a shift that could broadly influence the development of future language models [1].

Source:

[1] Why language models hallucinate (https://openai.com/index/why-language-models-hallucinate/)

Coin World

Quickly understand the history and background of various well-known coins

Latest Articles

Stay ahead of the market.

Get curated U.S. market news, insights and key dates delivered to your inbox.

Comments

﻿

Add a public comment...

No comments yet

AInvest
PRO

Editorial Disclosure & AI Transparency: Ainvest News utilizes advanced Large Language Model (LLM) technology to synthesize and analyze real-time market data. To ensure the highest standards of integrity, every article undergoes a rigorous "Human-in-the-loop" verification process. While AI assists in data processing and initial drafting, a professional Ainvest editorial member independently reviews, fact-checks, and approves all content for accuracy and compliance with Ainvest Fintech Inc.’s editorial standards. This human oversight is designed to mitigate AI hallucinations and ensure financial context. Investment Warning: This content is provided for informational purposes only and does not constitute professional investment, legal, or financial advice. Markets involve inherent risks. Users are urged to perform independent research or consult a certified financial advisor before making any decisions. Ainvest Fintech Inc. disclaims all liability for actions taken based on this information. Found an error?Report an Issue

OpenAI Rewires AI Learning to Punish Confident Lies, Reward Honesty

Latest Articles

Stay ahead of the market.

Comments

AInvestPRO

AInvest

AInvest
PRO