Nvidia-backed SandboxAQ unveils AI molecule dataset to accelerate drug discovery

SandboxAQ, an AI startup spun out of Alphabet’s Google and backed by Nvidia, just released a massive dataset it hopes will speed up drug development. The goal? Help scientists figure out more quickly whether a drug will stick to the protein it’s targeting.
Drug-protein binding is a fundamental part of drug discovery. If a treatment doesn’t bind properly to its target in the body, it’s unlikely to work. But figuring this out usually takes months of lab testing or heavy computation.
That’s where SandboxAQ comes in. Instead of relying on physical experiments, the startup used Nvidia chips to generate data—roughly 5.2 million synthetic 3D molecules—based on real-world scientific results. These molecules don’t exist in nature, but they were created using equations that reflect how atoms interact, grounded in experimental data.
The company is releasing the dataset to the public, hoping others will use it to train AI models that can predict drug-protein binding with high accuracy, but in a fraction of the time. SandboxAQ will charge for access to its own trained models, which it believes could perform on par with some lab-based methods.
“This is a long-standing problem in biology that we’ve all, as an industry, been trying to solve for,” Nadia Harhen, GM of AI simulation at SandboxAQ, told Reuters. “All of these computationally generated structures are tagged to a ground-truth experimental data, and so when you pick this data set and you train models, you can actually use the synthetic data in a way that’s never been done before.”
For researchers, the dataset could be a shortcut. If they’re testing whether a new drug might disrupt a biological process—say, slowing the progress of a disease—this tool could help predict whether the drug actually binds to the right target without needing to run experiments first.
The effort is part of a growing trend where physics-based models and machine learning are combined to tackle tough scientific problems. Traditional methods can model atomic behavior, but the number of possible combinations—even for small pharmaceutical compounds—is overwhelming. SandboxAQ’s AI-trained models aim to simplify that.
The startup has raised over $950 million since spinning out of Alphabet in 2022. Its latest $450 million Series E round drew backing from Ray Dalio, Horizon Kinetics, BNP Paribas, Google, NVIDIA, T. Rowe Price, Breyer Capital, and others. The company says the funding will help it scale its work across multiple industries, including biotech, cybersecurity, financial services, and materials science.
This release marks a major push into biopharma. By opening the dataset, SandboxAQ is betting that the next big breakthroughs in drug discovery could come from a keyboard, not a test tube.
We featured SandboxAQ last week in our series on top AI companies worth watching.
🚀 Want Your Story Featured?
Get in front of thousands of founders, investors, PE firms, tech executives, decision makers, and tech readers by submitting your story to TechStartups.com.
Get Featured