Google’s Gemini 3 outperforms Claude in coding benchmarks; Sourcegraph adopts it for millions of developers
Google’s Gemini 3 has pushed itself into the center of the AI coding discussion, eclipsing Claude Sonnet 4.5 in key benchmark tests and prompting Sourcegraph to make a major shift in its default AI engine. What started as a round of performance comparisons has turned into a broader signal of where AI-assisted development may be heading.
Reports from developer communities and industry benchmarks point to a consistent pattern: Gemini 3 delivers stronger results in coding tasks, faster iterations, and more stability across longer, multi-step workflows. That momentum reached a new level after The Information confirmed: “Gemini 3 outperformed Claude Sonnet 4.5 in coding evaluations and became Sourcegraph’s new default.”
Google’s Gemini 3: The Launch That Set the Stage
Google released Gemini 3 on November 18, 2025, calling it the company’s “most intelligent model yet.” The update brought major improvements in reasoning, multimodal capability, and long-horizon task execution. Built on lessons from previous versions, the model ties together text, code, images, audio, and video in a way that gives developers more room to move. Tasks that once required long prompts or multiple retries have become faster and less frustrating.
Adoption has been staggering. More than a million people experimented with it within the first day through Google’s coding tools, and the Gemini app now counts 650 million monthly active users. The surge isn’t just curiosity. Developers say they’re seeing clear improvements in code generation, debugging, and creative prototyping.
“Nearly two years ago we kicked off the Gemini era, one of our biggest scientific and product endeavors ever undertaken as a company. Since then, it’s been incredible to see how much people love it. AI Overviews now have 2 billion users every month. The Gemini app surpasses 650 million users per month, more than 70% of our Cloud customers use our AI, 13 million developers have built with our generative models, and that is just a snippet of the impact we’re seeing,” Google said in a blog post announcing the launch.
Alphabet CEO Sundar Pichai put the pitch simply: the model can “grasp depth and nuance,” offering more precise results with less back-and-forth.
“And now we’re introducing Gemini 3, our most intelligent model, that combines all of Gemini’s capabilities together so you can bring any idea to life. It’s state-of-the-art in reasoning, built to grasp depth and nuance — whether it’s perceiving the subtle clues in a creative idea, or peeling apart the overlapping layers of a difficult problem. Gemini 3 is also much better at figuring out the context and intent behind your request, so you get what you need with less prompting.”
Coding Head-to-Head: Gemini 3 vs. Claude Sonnet 4.5

The debate around performance sharpened as benchmark results trickled in. On SWE-Bench, Gemini 3 Pro posted around 76.2% accuracy on single-attempt tests. Claude Sonnet 4.5 still edged it out with 77.2% (or 78.2% with extended context). But those numbers didn’t tell the whole story. In LiveCodeBench Pro, which mirrors competitive coding challenges, Gemini 3 scored 2,439 to Claude’s 1,418. And in Vending-Bench 2, a long-term simulation that tests sustained reasoning over a year-long virtual company scenario, Gemini 3 kept its fictional business profitable while Claude stumbled.
Hands-on trials painted an even clearer picture. TechRadar asked the leading models to build “Thumb Wars,” a digital hand-wrestling game. Gemini 3 Pro delivered a functional PWA with lively visuals, quick responsiveness, and smooth animations. It adapted well to feedback, enhancing effects and movement without losing coherence. Claude produced a workable version but lacked the same sense of dimension. GPT-5.1 performed steadily but felt less dynamic.
Developers echoed those findings across Reddit threads and Cursor community posts, describing Gemini 3 as more inventive on the frontend and quicker at turning loose instructions into polished prototypes. Claude still gets credit for precise logic work, especially on backend tasks, but Gemini’s multimodal strengths give it broader versatility. Its ability to interpret UI images, produce visual assets, and map design concepts straight into code has become a highlight for teams working across disciplines.
Anthropic continues to push Claude 4.5 as the leading coding model, and its HumanEval score of nearly 90% remains one of the strongest in the field. Its multi-tool orchestration window of more than 30 hours is impressive as well. Yet the day-to-day experience for many developers leans toward Gemini’s range and adaptability.
Sourcegraph’s Shift: A Vote of Confidence for Google Gemini 3
The biggest validation didn’t come from benchmarks — it came from Sourcegraph.
The code intelligence platform, relied on by companies like Uber and Netflix, quietly made Gemini 3 Pro its default model for Cody, its AI coding assistant. Internal testing showed a significant jump in performance compared with Gemini 2.5 Pro, including more solved tasks, cleaner reasoning, and better handling of massive codebases.
“Gemini 3 outperformed Claude Sonnet 4.5 in coding evaluations and became Sourcegraph’s new default,” The Information reported.
Sourcegraph’s CTO captured the outcome clearly: “Gemini 3 has solved problems that stumped other leading models,” crediting the model’s long-context strength for deeply informed code navigation and problem-solving.
This is more than a routine upgrade. Sourcegraph has traditionally blended multiple models depending on the task. Moving millions of developers to Gemini 3 as the default signals trust in its reliability for real production workflows — from reading sprawling repositories to rewriting large blocks of frontend code. One JetBrains engineer who tested the update described noticeable progress in “depth, reasoning, and reliability.”
With over 13 million developers using Sourcegraph integrations through VS Code, GitHub, JetBrains IDEs, and internal company tools, this decision gives Gemini 3 a powerful distribution channel, accelerating how quickly the model can shape real engineering environments.
A Broader Shift in AI-Assisted Development
These developments arrive at a moment of intense competition. GPT-5.1, Claude Sonnet 4.5, and a wave of specialized coding models have been battling for enterprise adoption. Gemini 3 has changed the conversation partly through performance, but even more through distribution. Google’s reach across Search, Workspace, Android, YouTube, and cloud infrastructure gives it an advantage most rivals simply don’t have.
Salesforce CEO Marc Benioff shared his own perspective on X, saying he’s “not going back to ChatGPT” after testing Gemini 3 and calling the improvement “insane” in terms of reasoning and speed.
“Holy sh**it. I’ve used ChatGPT every day for 3 years. Just spent 2 hours on Gemini 3. I’m not going back. The leap is insane — reasoning, speed, images, video… everything is sharper and faster. It feels like the world just changed, again. ❤️ 🤖,” Benioff said on X.
The shift isn’t without debate. Benchmark results vary depending on context size, tool access, and prompt structure. Developers who prefer Claude often cite its intuitive handling of intricate logic chains. Google has expanded safety testing through partnerships with firms like Apollo and Vaultis to address reliability concerns as usage scales.
Where the Field Goes Next
As 2025 winds down, Gemini 3’s momentum — both in benchmarks and in the hands of real developers — marks a noticeable turning point. The free tier inside the Gemini app gives beginners and hobbyists access to high-grade coding support, while enterprise teams using Vertex AI and Google Cloud are already weaving the model into continuous integration systems, refactoring workflows, and agentic frameworks.
Deep Think mode for Gemini Ultra continues to attract developers wrestling with harder logic puzzles and mathematical problems, drawing interest from research institutions and companies building complex internal tools.
A new wave of IDEs and agent frameworks, including Antigravity, is preparing to take advantage of the model’s long-context and multimodal features. The real test now is how engineering teams adapt their pipelines and whether Gemini’s arrival pushes competitors to respond with models that match both its performance and distribution footprint.
For now, Gemini 3 stands out not just for winning benchmarks but for earning a place inside one of the most widely used code intelligence systems in the field. Sourcegraph’s switch speaks loudly. As developers always say: the proof is in the code.
Watch the YouTube video below to learn more about Gemini 3.

