Symbols

Elon Musk's AI Company Hires Contractors to Beat Anthropic's Claude in Coding

Thursday, Jul 17, 2025 9:49 am ET2min read

xAI hired contractors to train Grok on coding tasks to top a popular AI leaderboard, specifically Anthropic's Claude model. Documents obtained by Business Insider show xAI wanted Grok to outperform Claude in coding tasks. AI leaderboards have become a key battleground for labs chasing clout and investment.

xAI, Elon Musk's artificial intelligence company, has been actively engaged in a competitive effort to enhance its Grok model's performance on AI leaderboards. According to documents obtained by Business Insider, xAI hired contractors through Scale AI's Outlier platform to train Grok on coding tasks, aiming to outperform Anthropic's Claude model [1]. This strategic move highlights the growing importance of AI leaderboards as a key battleground for attracting investment and media attention.

The focus on leaderboards is not new, but the intensity and tactics employed have evolved. For instance, OpenAI and Anthropic have been known to leverage leaderboard rankings to secure funding and lucrative contracts. The industry's reliance on these unofficial scoreboards has led to intense competition, with some companies employing questionable tactics to boost their rankings [1].

xAI's Grok 4 model, launched on July 9, 2025, was specifically trained to "beat Sonnet 3.7 Extended," a reference to Anthropic's Claude model. The project involved contractors refining front-end code for user interface prompts to improve Grok's performance on WebDev Arena, an influential leaderboard from LMArena [1]. While Grok 4 ranked in the top three for LMArena's core categories of math, coding, and "Hard Prompts," its performance varied across different leaderboards, indicating the challenges of benchmarking AI models.

The race for leaderboard dominance is not without its controversies. For example, Meta's Llama 4 model faced accusations of gaming the leaderboard by using a different variant for public benchmarking than the one released to the public [1]. Similarly, xAI's project with Scale AI, while not proven to be gaming the leaderboard, underscores the competitive nature of the AI industry.

Despite the hype surrounding Grok 4, some users have expressed disappointment with its performance in real-world applications, particularly in coding tasks. This highlights the potential disconnect between leaderboard rankings and actual model capabilities [2]. The focus on leaderboard dominance can sometimes lead to models that excel in trivial exercises but falter when facing real-world challenges [1].

In addition to its leaderboard ambitions, xAI has also secured a contract with the U.S. Department of Defense to launch "Grok for Government," a specialized version of its chatbot platform tailored for federal use. This move comes as the Trump administration continues to spur federal AI adoption [3]. SpaceX, another Musk-owned company, has committed $2 billion to xAI's expansion, further signaling Musk's investment in the AI business [3].

In conclusion, xAI's Grok model's climb to the AI leaderboard is a strategic move to enhance its competitive standing in the AI industry. While the project has faced criticism and controversy, it also highlights the importance of AI leaderboards as a key metric for attracting investment and media attention. As the AI industry continues to evolve, the focus on leaderboard dominance is likely to remain a significant factor in shaping the competitive landscape.

References:
[1] https://www.businessinsider.com/grok-leaderboard-coding-anthropic-claude-scale-ai-2025-7
[2] https://www.reddit.com/r/singularity/comments/1lyzqzg/grok_4_disappointment_is_evidence_that_benchmarks/
[3] https://cryptorank.io/news/feed/4f030-elon-musks-xai-launches-grok-for-government

Stay ahead of the market.

Get curated U.S. market news, insights and key dates delivered to your inbox.

Comments

﻿

Add a public comment...

No comments yet

AInvest
PRO

Editorial Disclosure & AI Transparency: Ainvest News utilizes advanced Large Language Model (LLM) technology to synthesize and analyze real-time market data. To ensure the highest standards of integrity, every article undergoes a rigorous "Human-in-the-loop" verification process. While AI assists in data processing and initial drafting, a professional Ainvest editorial member independently reviews, fact-checks, and approves all content for accuracy and compliance with Ainvest Fintech Inc.’s editorial standards. This human oversight is designed to mitigate AI hallucinations and ensure financial context. Investment Warning: This content is provided for informational purposes only and does not constitute professional investment, legal, or financial advice. Markets involve inherent risks. Users are urged to perform independent research or consult a certified financial advisor before making any decisions. Ainvest Fintech Inc. disclaims all liability for actions taken based on this information. Found an error?Report an Issue

Elon Musk's AI Company Hires Contractors to Beat Anthropic's Claude in Coding

Stay ahead of the market.

Comments

AInvestPRO

AInvest

AInvest
PRO