AI Benchmarking Transparency: Epoch AI's Delayed Disclosure Raises Concerns
Sunday, Jan 19, 2025 4:24 pm ET
Epoch AI, a nonprofit organization primarily funded by Open Philanthropy, has come under criticism for delaying the disclosure of its partnership with OpenAI. The organization, which develops math benchmarks for AI, revealed in December 2023 that OpenAI had supported the creation of FrontierMath, a test designed to measure an AI's mathematical skills. This revelation raised concerns about the integrity and objectivity of the benchmark, as well as the potential for conflicts of interest.

Epoch AI's associate director, Tamay Besiroglu, admitted that the organization made a mistake in not being more transparent about the partnership. In a post on the LessWrong forum, a contractor for Epoch AI going by the username "Meemi" expressed concerns about the lack of transparency, stating that many contributors to the FrontierMath benchmark were not informed of OpenAI's involvement until it was made public. Meemi argued that Epoch AI should have disclosed OpenAI's funding and provided contractors with transparent information about the potential use of their work for capabilities.
The secrecy surrounding OpenAI's involvement in FrontierMath has led some users to raise concerns about the benchmark's reputation as an objective measure. In addition to backing FrontierMath, OpenAI had access to many of the problems and solutions in the benchmark, which was not disclosed prior to the announcement of o3. Epoch AI maintains that OpenAI has a verbal agreement not to use FrontierMath's problem set to train its AI, but this agreement is not legally binding.
Epoch AI's lead mathematician, Ellot Glazer, noted on Reddit that the organization has not been able to independently verify OpenAI's FrontierMath o3 results. While Glazer believes that OpenAI's score is legitimate, the lack of independent verification further erodes the credibility of the benchmark.

The saga of Epoch AI's delayed disclosure is yet another example of the challenges in developing empirical benchmarks to evaluate AI while securing necessary resources without creating the perception of conflicts of interest. As AI continues to evolve and become more integrated into society, it is crucial for benchmarking organizations to maintain transparency and independence to ensure the integrity and objectivity of their benchmarks.
In conclusion, Epoch AI's delayed disclosure of its partnership with OpenAI has raised concerns about the integrity and objectivity of the FrontierMath benchmark. To maintain trust with contributors and users, AI benchmarking organizations must prioritize transparency, disclose funding sources and partnerships, and establish clear guidelines for contributors. By doing so, they can help ensure the responsible development and evaluation of AI systems.