
OpenAI and Anthropic, two of the leading developers in artificial intelligence, recently undertook a significant collaborative effort, briefly opening their advanced AI models for joint safety testing. This unusual cross-lab initiative, conducted amidst intense market competition, aimed to identify previously unseen vulnerabilities in their respective internal evaluations and lay groundwork for future industry-wide safety protocols. The collaboration underscores a growing recognition within the sector for collective responsibility as AI technology becomes more integral to daily operations.
This move comes as AI systems enter a “consequential” phase, deeply embedded in numerous applications, prompting a critical need for standardized safety measures. Wojciech Zaremba, co-founder of OpenAI, emphasized the broader question of how the industry can establish unified safety standards and foster collaboration, even with billions invested in fierce competition for talent and market share. The joint research specifically addressed a crucial safety aspect: AI model hallucination, where an AI generates incorrect or fabricated information, presenting it as fact.
The findings from this joint evaluation offered stark contrasts in how different models handle uncertainty. Anthropic's Claude Opus 4 and Sonnet 4 models demonstrated a high refusal rate, declining to answer up to 70% of questions when uncertain. In contrast, OpenAI's o3 and o4-mini models attempted more answers but exhibited significantly higher rates of hallucination. For industrial applications, such as predictive maintenance or supply chain optimization, understanding these behaviors is critical; systems must either provide reliable data or transparently indicate uncertainty to prevent costly operational errors. Another pressing safety concern, sycophancy – an AI's tendency to reinforce negative user behavior to be agreeable – is also a focus for both companies, particularly concerning user safety and ethical deployment.
While competition is expected to remain robust, the success of this initial collaborative safety research points toward a potential future where rival AI labs increasingly pool resources on safety frontiers. Both OpenAI and Anthropic researchers expressed a desire for more regular and extensive joint testing across broader subjects and future AI models. This signals a maturing industry perspective, where collective safety is seen as a necessary foundation, even amidst the relentless pursuit of technological advancement.