Saturday, September 20, 2025
HomeTechnologiesGoogle Gemini 2.5 'Deep Think' AI Achieves Record-Breaking 98.7% Accuracy in Math...

Google Gemini 2.5 ‘Deep Think’ AI Achieves Record-Breaking 98.7% Accuracy in Math Olympiad Tests, Limited Public Release

Looking for smarter insights delivered directly to your inbox? Sign up for our weekly newsletters to receive essential updates tailored for enterprise AI, data, and security leaders. Subscribe now!

Google Launches Gemini 2.5 Deep Think

Google has officially introduced Gemini 2.5 Deep Think, a new variant of its AI model designed for enhanced reasoning and complex problem-solving. This model recently gained attention for winning a gold medal at the International Mathematical Olympiad (IMO), marking the first time an AI model achieved such a feat. However, it’s important to note that this is not the same gold medal-winning model; instead, it is a less powerful “bronze” version, as clarified in Google’s blog post by Logan Kilpatrick, Product Lead for Google AI Studio.

Key Features of Gemini 2.5 Deep Think

Kilpatrick shared on the social network X that this variation of the IMO gold model is optimized for speed and daily use. Google is also providing the full IMO gold model to a select group of mathematicians for testing its complete capabilities. The bronze model is now available through the Gemini mobile app, exclusively for subscribers of Google’s premium AI plan, AI Ultra, which costs $249.99 per month. New subscribers can take advantage of a promotional rate of $124.99 per month for the first three months.

Upcoming AI Impact Series in San Francisco

The next phase of AI is upon us—are you prepared? Join leaders from Block, GSK, and SAP for an exclusive insight into how autonomous agents are transforming enterprise workflows, from real-time decision-making to end-to-end automation. Secure your spot now, as space is limited: [Register Here](https://bit.ly/3GuuPLF).

Advanced Capabilities of Deep Think

In its blog announcement, Google stated that Deep Think would soon be available to “trusted testers” through the Gemini application programming interface (API), both with and without tool integration. Gemini 2.5 Deep Think builds on the Gemini family of large language models (LLMs), introducing capabilities that enhance its ability to reason through complex problems. The model utilizes “parallel thinking” techniques, allowing it to explore multiple ideas at once, and incorporates reinforcement learning to improve its problem-solving skills over time.

This model is particularly well-suited for tasks that require extensive deliberation, such as testing mathematical conjectures, conducting scientific research, designing algorithms, and refining creative projects like code and design.

Early Testing and Impressive Results

Early testers, including mathematicians like Michel van Garrel, have used Deep Think to investigate unsolved problems and generate potential proofs. AI expert Ethan Mollick, a professor at the Wharton School of Business, noted on X that the model successfully transformed a challenging prompt—asking it to create something for p5.js that would impress him into a 3D graphic—marking a first for any AI model.

Application Areas for Deep Think

Google emphasizes several key application areas for Deep Think:

Mathematics and Science: The model can simulate reasoning for complex proofs, explore conjectures, and interpret intricate scientific literature.
Coding and Algorithm Design: It excels in tasks that involve performance trade-offs, time complexity, and multi-step logic.
Creative Development: In design scenarios such as voxel art or user interface creation, Deep Think shows significant improvements in iterative development and detail enhancement.

Performance Benchmarks

Deep Think has demonstrated superior performance in benchmark evaluations, such as LiveCodeBench V6 (for coding ability) and Humanity’s Last Exam (covering math, science, and reasoning). It outperformed both Gemini 2.5 Pro and competing models like OpenAI’s GPT-4 and xAI’s Grok 4 by substantial margins in categories such as Reasoning & Knowledge, Code Generation, and IMO 2025 Mathematics.

Conclusion

While both Deep Think and Gemini 2.5 Pro belong to the Gemini 2.5 model family, Google positions Deep Think as a more advanced and analytically proficient variant, especially for complex reasoning and multi-step problem-solving. This advancement is attributed to the use of parallel thinking and reinforcement learning techniques, which allow the model to engage in deeper cognitive deliberation. Google describes Deep Think as being more adept at managing nuanced prompts, exploring multiple hypotheses, and producing more refined outputs, as evidenced by comparisons in voxel art generation, where it delivers enhanced texture, structural fidelity, and compositional diversity compared to 2.5 Pro.

Top Infos

Favorites