Apple's AI Models Lag Behind Rivals in Key Benchmarks

Coin WorldWednesday, Jun 11, 2025 10:05 am ET
2min read

In the rapidly evolving tech landscape, recent updates from Apple regarding its AI models have sparked significant attention. The company's latest AI models, powering Apple Intelligence features, have been benchmarked internally, revealing underperformance compared to established rivals. This has raised questions about Apple's standing in the competitive Generative AI race.

Apple's blog post detailing the performance of its updated AI models provided insights based on human evaluations. The Apple On-Device Model, which operates locally on devices like the iPhone, was rated as comparably good to, but not better than, models from Google and Alibaba. This model, roughly 3 billion parameters in size, highlights Apple's efforts in on-device AI capabilities. However, the Apple Server Model, operating in Apple’s data centers, was rated behind OpenAI’s GPT-4o, a model that has been available for over a year. In image analysis tests, human raters preferred Meta’s Llama 4 Scout model over the Apple Server model, despite Llama 4 Scout performing worse than leading models from companies like Google, Anthropic, and OpenAI in other tests.

These benchmarks suggest that while Apple is making strides, it may still be catching up in certain areas of Generative AI development. The results align with earlier reports indicating challenges in Apple’s AI research efforts. Apple’s AI capabilities have sometimes been perceived as less advanced, and a significant planned upgrade to its voice assistant, Siri, has reportedly been delayed indefinitely. The company has also faced lawsuits from customers alleging that it marketed AI features that were not yet delivered.

Despite these setbacks, Apple’s new AI models offer several capabilities. The Apple On-Device model powers features like summarization and text analysis. Both models boast improved tool-use and efficiency compared to their predecessors. They support around 15 languages and have benefited from an expanded dataset including images, PDFs, documents, manuscripts, infographics, tables, and charts. For developers, access to these new capabilities is opening up. Third-party developers can utilize the Apple On-Device model via Apple’s Foundation Models framework, allowing integration of features like summarization and text analysis into third-party applications.

The field of Generative AI is intensely competitive, with rapid advancements happening across the industry. Companies are investing heavily in research, development, and infrastructure to build ever more capable AI models. Apple’s benchmark results underscore the significant challenge involved in not just developing advanced AI but also in surpassing the performance of models already in the market from dedicated AI labs and tech rivals. While Apple has a strong ecosystem and user base, success in Apple Intelligence will ultimately depend on the perceived quality and utility of its AI features, which are directly tied to the performance of its underlying AI models. The path to leading the Generative AI race appears to require substantial continued effort and innovation for Apple.

In summary, Apple’s latest AI benchmarks present a candid view of its progress in artificial intelligence. While the company is integrating AI deeply into its operating systems and offering developer access, its new AI models are currently performing comparably to or behind older models from competitors like OpenAI, Google, Alibaba, and Meta in key tests. This highlights the significant hurdles Apple faces in catching up in the competitive Generative AI landscape and delivering on the full promise of Apple Intelligence features.