OpenAI recently clarified the capabilities of its GPT-5 large language model after initial claims regarding its ability to solve previously unsolved mathematical problems faced scrutiny from the scientific community. The clarification follows a public discussion initiated by a now-deleted post from an OpenAI executive and subsequent critiques from leading figures in the artificial intelligence sector.
Kevin Weil, OpenAI's Vice President of Product, had reportedly stated in a social media post that "GPT-5 found solutions to 10 (!) previously unsolved Erdős problems and made progress on 11 others." Erdős problems are well-known mathematical conjectures. This assertion was quickly challenged by mathematician Thomas Bloom, who curates the Erdos Problems website. Bloom characterized Weil's post as "a dramatic misrepresentation," explaining that while the problems were listed as "open" on his site, this indicated his personal unawareness of published solutions, not that the problems were inherently unsolved.
Bloom clarified that GPT-5 had, in fact, located existing solutions within published literature that he had not been aware of, rather than independently deriving new solutions to previously unsolved conjectures.
The initial claim and subsequent clarification drew sharp reactions from other prominent AI leaders. Yann LeCun, Chief AI Scientist at Meta, commented on the situation, stating, "Hoisted by their own GPTards." Demis Hassabis, CEO of Google DeepMind, also weighed in, calling the situation "embarrassing."
Following these exchanges, Sebastien Bubeck, an OpenAI researcher who had also previously highlighted GPT-5's mathematical accomplishments, acknowledged the distinction. Bubeck stated that "only solutions in the literature were found," but he suggested that the model's ability to effectively search and retrieve complex solutions from existing literature remained a significant achievement, noting the difficulty inherent in such tasks. This incident highlights ongoing discussions within the AI community regarding the precise definition and communication of large language models' problem-solving capabilities.