Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. More information
Googling has quietly released a major update to its popular artificial intelligence model, Gemini, which now explains the reasoning process, sets new performance records on mathematical and scientific tasks, and provides a free alternative to OpenAI’s premium services.
The new one Gemini 2.0 Flash Thinking Modelreleased on Tuesday in the Google AI Studio under the experimental name “Exp-01-21”, achieved a score of 73.3% on the American Invitational Math Exam (AIME) and 74.2% on the GPQA diamond scientific benchmark. These results show clear improvements over previous AI models and demonstrate Google’s increasing strength in advanced reasoning.
“We’ve been pioneering these types of planning systems for more than a decade, starting with programs like AlphaGo, and it’s exciting to see the powerful combination of these ideas with the most capable foundation models,” wrote Demis HassabisCEO of Google DeepMind, in a post on X.com (formerly Twitter).
Our latest update to our Gemini 2.0 Flash Thinking model (available here: https://t.co/Rr9DvqbUdO) scores 73.3% on AIME (math) and 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just now… pic.twitter.com/cM1gNwBoTO
— Demis Hassabis (@demisassabis) January 21, 2025
Gemini 2.0 Flash Thinking breaks records with processing millions of tokens
The most notable feature of the model is its ability to process up to a million tokens of text – five times more than that OpenAI’s o1 Pro model – while maintaining faster response times. This expanded context window allows the model to analyze multiple research papers or extensive data sets simultaneously, a capability that could change the way researchers and analysts work with large amounts of information.
“As a first experiment, I took several religious and philosophical texts and asked Gemini 2.0 Flash Thinking to weave them together, creating new and unique insights,” Then Macsaid an AI researcher who tested the model, in a post on X.com. “It processed a total of 970,000 tokens. The output is pretty incredible.”
The release comes at a pivotal time in the evolution of the AI industry. OpenAI recently released its o3 modelwhich achieved a score of 87.7% on the GPQA Diamond benchmark. However, Google’s decision to offer its model for free during beta testing (with usage limits) could attract developers and companies looking for alternatives to OpenAI’s $200 monthly subscription.
Google offers free Gemini 2.0 Flash Thinking with built-in code execution
Jeff DeanChief Scientist at Google DeepMind, highlighted improvements in the model’s reliability: “We keep iterating, with higher reliability and fewer contradictions between the model’s thoughts and the final answers.” he wrote.
The model also includes native code execution capabilities, allowing developers to run and test code directly within the system. This feature, combined with improved adversarial protection, positions Gemini 2.0 Flash Thinking as a serious contender for both research and commercial applications.
Industry analysts note that Google’s focus on explaining its reasoning process could help address growing concerns about AI’s transparency and reliability. Unlike traditional “black box” models, Gemini 2.0 Flash Thinking shows its work, making it easier for users to understand and verify its conclusions.
We keep iterating, with higher reliability and fewer contradictions between the model’s thoughts and the final answers.
View it as gemini-2.0-flash-thinking-exp-01-21 on https://t.co/sw0jY6k74m
— Jeff Dean (@JeffDean) January 21, 2025
AI transparency becomes the new battleground as Google challenges OpenAI
The model has already claimed the top spot on the list Chatbot Arena Leaderboarda prominent benchmark for AI performance, leading in categories such as hard clues, coding, and creative writing.
However, questions remain about the model’s performance and limitations in the real world. While benchmark scores provide valuable metrics, they don’t always translate directly to practical applications. Google’s challenge is to convince business customers that its free offering can match or even exceed the capabilities of premium alternatives.
As the AI arms race intensifies, Google’s latest release suggests a change in strategy: combining advanced capabilities with accessibility. Whether this approach will help close the gap with OpenAI remains to be seen, but it certainly gives tech decision makers a compelling reason to reconsider their AI partnerships.
For now, one thing is clear: the era of AI that can show off its work has arrived and is available to anyone with a Google account.
Source link
Leave a Reply