AI startup Baseten raises $75M to eliminate AI inference bottlenecks and accelerate adoption

AI startups are raising billions, but most of that money is pouring into model training. What happens after a model is trained? It needs to run in production—fast, reliably, and without costing a fortune. That’s where Baseten comes in. The San Francisco-based startup just closed a $75 million Series C round, co-led by IVP and Spark Capital, with backing from Greylock, Conviction, South Park Commons, 01 Advisors, and Lachy Groom.
With this fresh funding, Baseten has now raised $135 million to tackle one of AI’s biggest challenges: inference—the process of running AI models in real-world applications. The money will fuel product expansion, hiring, and global growth as the company scales to meet rising demand.
Founded in 2019 by Amir Haghighat, Tuhin Srivastava (CEO), Philip Howes, and Pankaj Gupta, the San Francisco-based Baseten has grown to a team of about 60 employees since its inception six years ago. The company has also built a strong customer base, serving more than 100 enterprises and hundreds of smaller businesses, including Descript, Patreon, and Writer.
As AI efficiency becomes a top priority following a major breakthrough from Chinese AI lab DeepSeek in January, Baseten moved quickly to support DeepSeek’s R1 reasoning model, which competes with OpenAI’s O1. The company promotes its ability to deliver top-tier performance at a significantly lower cost than OpenAI.
DeepSeek claims its models were trained at a fraction of the cost compared to U.S. counterparts, sparking increased interest in alternatives to high-priced AI solutions.
According to CEO Tuhin Srivastava, Baseten has seen a surge in demand from organizations exploring a switch to DeepSeek. The company has been working hard to keep pace with the growing interest, ensuring its platform can meet the needs of businesses looking for more affordable AI solutions.
“There are a lot of people paying millions of dollars per quarter to OpenAI and Anthropic that are thinking, ‘How can I save money?’” he said. “And they’ve flocked.”
The Growing Demand for AI Inference
AI products are increasingly embedding models as core components rather than optional add-ons. This shift means inference—the process of querying a trained model and receiving results—has to be fast, scalable, and cost-efficient. But running inference at scale is expensive, often plagued by slow response times and GPU shortages.
“Anyone building an AI product that isn’t worried about inference hasn’t hit real scale yet,” said Will Reed, General Partner at Spark Capital. “Every successful AI product needs exceptional inference performance or nobody wants to use it. And when you’re betting the future of your product or your company on that performance, choosing the right partner is make-or-break.”
How Baseten Makes AI Models Work in Production
For many companies, moving from AI development to deployment is a painful process. Models that work well in a lab setting often struggle in production, leading to delays, downtime, and unexpected costs. Baseten’s platform is designed to handle the heavy lifting, allowing AI teams to focus on building their products instead of worrying about infrastructure.
Instead of operating its own data centers, Baseten deploys its software on infrastructure from leading cloud providers like Amazon and Google. Enterprise customers can integrate their own infrastructure through a dedicated tier, while Baseten’s multi-cloud approach ensures access to a larger pool of GPUs than any single provider can offer at a given time.
“In this market, your No. 1 differentiation is how fast you can move. That is the core benefit for our customers,” Srivastava told CNBC. “You can go to production without worrying about reliability, security and performance.”
Why Investors Are Betting on Baseten
Baseten’s approach has already won over customers like Abridge, Gamma, and Writer, who rely on AI-powered products and need their models to perform under real-world conditions.
“Our customers prioritize bringing high-quality products to market quickly, and they choose us to help make that happen,” said Tuhin Srivastava, CEO and cofounder of Baseten. “Speed, reliability, and cost-efficiency are non-negotiables, and that’s where we devote 100 percent of our focus. It’s that dedication—and the trust we’ve built with an incredible group of customers who have collectively raised billions—that has allowed us to grow fivefold in the past year with basically zero churn.”
Baseten’s team has tripled in size, now at 50 employees, with talent from companies like GitHub, Google, Uber, Amazon, Palantir, Atlassian, and AirTable. The company has also rolled out multi-cloud support, hybrid cloud capabilities, and integrations with TensorRT, making it easier for AI teams to run their models efficiently.
“Baseten has continually focused on making AI inference performant, reliable, scalable, and multicloud,” said Sarah Guo, General Partner and Founder at Conviction. “Their growth is being driven by their offering of a mature product at the right time, accelerated by strong tailwinds: advancing model capabilities, more open-source models, and increasing interest from companies in shipping production AI applications quickly.”
The Road Ahead
With AI adoption surging, the demand for fast, reliable inference is only going to increase. Baseten is positioning itself as the go-to provider for AI-native companies that need to deploy models without running into infrastructure bottlenecks. With fresh capital and growing industry momentum, the startup is gearing up for its next phase of growth.