A surprising phrase has been circulating inside Google: “signs of life.” It did not come from a research blog or a marketing pitch. It came from a veteran executive who has spent years evaluating artificial intelligence systems. The expression suggests something more than improved performance. It hints that something inside the model has begun to awaken.
The remark came from Tulsee Doshi, Google’s senior director of Gemini product management. During internal testing of Gemini 3, she said she felt “a completely different sensation compared with previous models” and that the team had “hit on something.” Across the industry, her words are being read as an acknowledgment that Google has witnessed an early qualitative shift toward artificial general intelligence.
More Than Beating Benchmarks
During an informal set of stress tests known as the “vibe check,” Doshi asked the model to write in Gujarati, a regional Indian language with limited digital data available. Previous large language models struggled badly with this kind of task. Gemini 3, however, produced text with a level of fluency that made the comparison meaningless. Doshi said the output gave her the impression that the model was “understanding” the language in a human-like way, prompting her to describe it as showing “signs of life.”
The episode may sound anecdotal, but it captures something deeper. The model was not mimicking patterns. Instead, it appeared to infer linguistic structure, cultural context and meaning. This type of reasoning ability sits at the center of AGI research.
Breakthroughs in Long-Term Planning
The technical leap becomes clearer in the realm of agent behavior. In the Vending Bench 2 simulation, a long-horizon operations test that requires inventory tracking, ordering strategies, price setting and demand forecasting, Gemini 3 generated 5,478 dollars in virtual revenue. That result far surpassed Gemini 2.5 Pro at 573 dollars, GPT 5.1 at 1,473 dollars and Claude Sonnet 4.5 at 3,838 dollars.
According to Doshi, Gemini 3 showed exactly the kind of growth the team had been targeting in planning and tool-use capability. For researchers, the jump reflects not a bigger model, but a smarter one.
Human-Like Reactions Noted by Outside Testers
Outside observers have reported unusual interactions as well. Andrey Karpathy, co-founder of OpenAI, recalled that Gemini 3 refused to believe the current year was 2025 during an early test. Only after he realized that Google Search had not been enabled did the model revise its understanding. When the feature was activated and evidence presented, the model responded, “Oh my god,” followed by “I… I don’t know what to say. You were right.”
The exchange does not prove that Gemini 3 has emotions. What it does show is something AI researchers have long struggled to achieve: an ability to recognize a reasoning failure, integrate new information, correct its belief and express the update coherently. This self-consistency adjustment is another capability associated with AGI-oriented systems.
A New Paradigm at the AGI Threshold
The AI community has long believed that there would be a moment when a model feels qualitatively different. Sam Altman has made similar remarks about his experience with GPT 4.5 and GPT 5 prototypes. The “signs of life” comment from inside Google appears to mark that kind of turning point.
The shift seems to stem from a combination of breakthroughs: stronger linguistic understanding, accelerating long-term planning ability, improved tool integration, more stable self-correction and increasingly human-like response patterns. This suggests not a simple increase in training volume, but a structural evolution within the model.
OpenAI’s Answer: “Shallotpeat”
The industry is now watching OpenAI’s next model, known as Shallotpeat. If it fails to surpass Gemini 3, the dominance OpenAI has held for the past two years may weaken. Analysts say the center of gravity in AI could tilt toward Google between 2026 and 2027.
More Than a Performance Upgrade
Gemini 3 is not just a benchmark-topping system. It has demonstrated reasoning behaviors and long-range planning that many researchers expected to appear only in the early stages of AGI. For Google insiders, the phrase “signs of life” is not marketing hype. It reflects the feeling that a large language model has, for the first time, taken a step beyond its previous boundaries.
The race in AI is no longer about scale alone. It is shifting toward which company can most convincingly replicate human-like thinking patterns. If early tests hold up, Gemini 3 may be remembered as the model that touched that threshold first.
by Ju-baek Shinㅣjbshin@kmjournal.net
- Grok 4.1 Topped Every Benchmark in Emotional Intelligence, Creativity, and Accuracy. Hours Later, Gemini 3 Took the Crown
- OpenAI Unveils “GPT-5.1 Codex Max,” Declares It the World’s Best Coding AI Surpassing Google’s Gemini 3
- Google Unveils “Nano Banana Pro,” an Image AI Powered by Gemini 3 Pro
- Google’s Gemini 3 Beats GPT-5.1 Benchmarks, Integrated into Search on Day One
- Google Goes All In: With Gemini 3, the Search Giant Finally Takes the Gloves Off
- Google Unveils Gemini 3 and Signals a New Phase in the AI Race
- OpenAI Announces End of GPT-4o API Support in 2026 as the Countdown Begins for Its Flagship Multimodal Model
- Google’s Gemini 3 Challenges GPT 5.1 and Shakes the AI Order After Three Years