Tensormesh raises $20M from NVIDIA, AMD, and CoreWeave to slash AI inference costs by up to 10x
Enterprises are pouring billions into AI infrastructure, yet a surprising amount of that money is being burned on the same computation over and over again.
That inefficiency sits at the center of Tensormesh’s pitch.
The AI infrastructure startup announced Wednesday it has raised $20 million in new funding from AMD Ventures, CoreWeave, NVentures, Valley Capital Partners, and Laude Ventures. The latest raise extends Tensormesh’s seed round, bringing total funding to $24.5 million.
At the same time, the company is rolling out general availability of Tensormesh Inference, a SaaS platform built to address one of enterprise AI’s growing problems: repeated inference computation that drives up GPU costs and slows applications.
NVIDIA and AMD Are Betting on a New Layer of AI Infrastructure
Every time an AI model receives a request, the system often reprocesses the same information from scratch. System prompts, chat history, tool definitions, and repeated context are recomputed repeatedly, consuming GPU cycles each time. That becomes expensive at scale, especially for agentic AI systems handling multi-step workflows.
Tensormesh says its platform fixes that by using KV caching, a method that stores previously computed results and reuses them rather than rerunning the same calculations. The company claims the approach can reduce latency and GPU spending by up to 10x.
The timing matters. AI companies are racing to secure more GPUs from NVIDIA and AMD, yet many enterprises are discovering that raw compute alone does not solve the economics problem tied to large-scale inference. That has created growing interest in software layers focused on efficiency instead of brute-force hardware expansion.
The investor lineup reflects that shift.
“As enterprises scale AI workloads, maximizing every GPU cycle is critical. Software innovations like KV caching are a powerful complement to raw accelerator performance. Paired with AMD Instinct™ GPUs, Tensormesh’s platform can help customers drive value from their infrastructure investments,” said Ramine Roane, corporate vice president, AI at AMD.
CoreWeave framed the opportunity in similar terms.
“Tensormesh is working to solve infrastructure challenges that will ultimately impact the economics and scalability of AI. Their work advancing KV caching can help make inference faster and more efficient at scale, and it reflects exactly the kind of foundational innovation CoreWeave Ventures is committed to backing,” said Brannin McBee, co-founder and chief development officer at CoreWeave.
With $20M in funding, AI startup Tensormesh wants to tackle AI’s biggest hidden cost: wasted GPU compute
Tensormesh emerged from the open-source AI infrastructure community. The startup was founded by researchers and alumni from the University of Chicago, UC Berkeley, and Carnegie Mellon. CEO Junchen Jiang is a faculty member at the University of Chicago and co-creator of LMCache, an open-source KV caching project that has gained traction across the AI developer ecosystem.
The company says LMCache now has more than 8,000 GitHub stars and integrations with platforms including vLLM, TensorRT, AWS SageMaker, NVIDIA Dynamo, Oracle OCI Data Science, and SGLang.
“What started as a research project around KV caching is becoming a critical part of the AI stack. Tensormesh understood early that enterprises were paying AI systems to recompute the same work again and again, and built foundational infrastructure to eliminate that inefficiency and dramatically improve price-performance. The team has paired deep systems expertise with real open-source credibility to build infrastructure enterprises can actually rely on,” said Pete Sonsini, co-founder and general partner at Laude Ventures.
Tensormesh is trying to separate itself from inference providers that quietly cache tokens behind the scenes without exposing how those savings are calculated. The startup says customers can track cache hit rates, GPU utilization, token-level costs, and savings in real time through its dashboard.
One of the company’s more aggressive moves is pricing. Tensormesh says that cached input tokens served from KV storage will incur a permanent $0 cost across its serverless deployments. The idea is straightforward: if the GPU already processed the work once, customers should not pay to process it again.
That message could resonate with enterprises struggling to control inference spending as AI applications move from pilot projects into production systems.
The platform launches with two deployment models. A serverless option gives developers OpenAI-compatible API access to frontier models without having to manage infrastructure. Reserved deployments target enterprises that need dedicated capacity and custom SLAs for larger workloads.
Samsung Electronics is already working with the company on storage optimization tied to next-generation AI infrastructure.
“As AI workloads grow, intelligent reuse of cached state has become one of the most powerful levers for performance and cost efficiency,” said Leno Park, vice president of Nand product planning at Samsung Electronics. “Tensormesh’s LMCache is built to take full advantage of next-generation storage, and we look forward to our continued collaboration to push the boundaries of what’s possible across the AI stack.”
The funding will go toward product development, deeper integrations with AMD, CoreWeave, and NVIDIA infrastructure, and continued investment in LMCache.
For Tensormesh, the bet is bigger than caching itself. The company is betting that inference efficiency becomes one of the defining battles in enterprise AI as organizations realize the true cost of running large models at scale.
