Spanish AI startup Multiverse raises $217M to shrink LLMs and cut inference costs by 80%

Multiverse Computing, an AI startup based in San Sebastian, Spain, has raised €189 million (about $217 million) to tackle one of the bloated large language models (LLMs), the company announced Thursday
The funding round was led by Bullhound Capital and included participation from HP Inc., Forgepoint Capital, and Toshiba. The company says the new capital will help scale its compression technology, capable of shrinking LLMs.
Multiverse Computing recently introduced a new compression tool called CompactifAI, claiming it can shrink large language models (LLMs) Meta’s Llama, by up to 95% without hurting performance. In practical terms, that means companies can cut AI-related costs by as much as 80%.
After a year of development and pilot deployments, the company is ready to scale, with help from a fresh round of international and strategic backers.
Multiverse is combining ideas from quantum physics and machine learning to achieve these results, though the tech doesn’t require a quantum computer. It’s built to mimic how quantum systems behave, but runs on classical hardware.
With this latest round, Multiverse becomes the largest AI startup in Spain and joins the ranks of European AI heavyweights like Mistral, Aleph Alpha, Synthesia, Poolside, and Owkin, Reuters reported.
The company has already released compressed versions of major open-source models, including Llama, DeepSeek, and Mistral, and plans to add more soon. Its CEO, Enrique Lizaso Olmos, says they’re focused on optimizing models that companies are already using.
“We are focused just on compressing the most used open-source LLMs, the ones that the companies are already using,” Lizaso Olmos said. “When you go to a corporation, most of them are using the Llama family of models.”
Multiverse’s tool is already available on the Amazon Web Services AI marketplace, making it easier for businesses to test and deploy without major changes to their existing stack.
How Multiverse Is Shrinking Bloated LLMs to Cut AI Costs
The core problem Multiverse is addressing: LLMs are expensive to run. They typically rely on heavy-duty cloud infrastructure, which drives up energy bills and limits adoption. While other compression methods like quantization and pruning try to ease the load, they often sacrifice performance in the process.
CompactifAI takes a different route. Instead of just trimming models down, it uses a quantum-inspired technique called Tensor Networks to rethink how neural networks are structured. The result: smaller, faster, and cheaper models that still deliver the same results. According to Multiverse, its compressed models run 4 to 12 times faster and cut inference costs by 50% to 80%.
And it’s not just about cost. These smaller models are lightweight enough to run not just in the cloud or enterprise data centers, but also on local machines—laptops, smartphones, vehicles, drones, even Raspberry Pi boards.
“The prevailing wisdom is that shrinking LLMs comes at a cost. Multiverse is changing that,” said CEO Enrique Lizaso Olmos. “What started as a breakthrough in model compression quickly proved transformative, unlocking new efficiencies in AI deployment and earning rapid adoption for its ability to radically reduce the hardware requirements for running AI models.”
The science behind CompactifAI stems from co-founder Román Orús, who helped pioneer the Tensor Networks approach. “For the first time in history, we can profile the inner workings of a neural network to eliminate billions of spurious correlations to truly optimize all sorts of AI models,” Orús said.
🚀 Want Your Story Featured?
Get in front of thousands of founders, investors, PE firms, tech executives, decision makers, and tech readers by submitting your story to TechStartups.com.
Get Featured