OpenAI unveils Jalapeño, its first custom AI chip built with Broadcom to power ChatGPT and cut inference costs
OpenAI has built its own chip.
On Thursday, the company unveiled Jalapeño, its first custom AI accelerator, developed with Broadcom as part of a broader push to control more of the infrastructure behind ChatGPT, Codex, its API business, and the next wave of agentic AI products. The chip is built for inference, the expensive, compute-heavy process of generating responses after a model has already been trained. That matters. In AI, training gets the headlines, but inference is where the bills pile up and where users actually feel the product.
For OpenAI, Jalapeño is more than a hardware milestone. It is a signal that the company wants a bigger grip on the economics of AI itself.
The launch puts OpenAI deeper into the same custom-silicon race that is already reshaping the industry. Google has its TPUs. Amazon has Trainium and Inferentia. Meta has spent years building in-house AI hardware. Microsoft has Maia. Now OpenAI, long seen primarily as a model company, is making a public move into chip design as the fight for AI leadership shifts from model quality alone to the harder question of who controls the stack underneath it.
That stack is becoming the real battleground. Chips, power, data centers, networking, deployment systems, and the cost of serving billions of prompts are starting to matter just as much as the model itself. For startups watching the AI market evolve in real time, Jalapeño is another reminder that the business is moving past flashy demos and into infrastructure warfare.
OpenAI Launches Jalapeño, Its First AI Chip, as It Moves to Control More of the AI Compute Stack
OpenAI framed the chip as the first accelerator in a multi-generation compute platform it is building with Broadcom. The company said Jalapeño was architected around its own view of how large language model inference will evolve, with a focus on making advanced AI faster, more reliable, and cheaper to serve at scale.
“Chips are foundational to the AI economy. Building our own expands our full-stack platform from products to models to infrastructure, and will help us scale intelligence, serve more people, and expand access to AI,” OpenAI said in its announcement.
OpenAI CEO Sam Altman and President Greg Brockman were presented with the chip by Broadcom CEO Hock Tan and President Charlie Kawwas, a symbolic handoff that underscored how serious OpenAI is about owning more of its infrastructure future.
The company has already hinted at this direction for months. Reuters reported earlier that OpenAI was working with Broadcom on a custom chip for inference, as it looks to reduce its dependence on scarce, expensive GPUs. Thursday’s announcement confirms that effort and gives it a name, a roadmap, and a much clearer strategic purpose.
OpenAI says Jalapeño was built from scratch around the workloads that matter most to its business. That includes ChatGPT, Codex, its API platform, and future AI agents. The company said the design was shaped by its internal knowledge of model kernels, serving systems, memory movement, networking demands, and the way frontier LLMs behave under real production loads. Broadcom handled silicon implementation and contributed networking technologies, including its Tomahawk networking silicon, while Celestica helped with board, rack, and system-level integration.
Engineering samples are already running machine learning workloads in the lab at production target frequency and power, according to OpenAI, including GPT-5.3-Codex-Spark. The company has not released benchmark numbers yet, but said early testing shows Jalapeño should deliver substantially better performance per watt than current state-of-the-art systems. A detailed technical report is expected in the coming months.
That performance-per-watt claim is a big one. It goes straight to the central problem facing the AI industry: how to keep serving smarter models without letting compute costs spiral out of control. Inference is where AI companies spend large sums of money once products reach scale. A better inference chip can mean faster responses, lower operating costs, more predictable capacity, and less dependence on outside suppliers. For OpenAI, that could translate into cheaper API calls, better uptime during demand spikes, and more room to push advanced models into mainstream products.
“The world is moving to a compute-powered economy,” said Greg Brockman, President and Co-Founder of OpenAI. “Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant, resulting in AI that is faster, more reliable, more affordable for people and businesses, and can be used to solve more important problems. By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access.”
Richard Ho, who leads OpenAI’s hardware program, said the chip was optimized for the specific bottlenecks that arise in large-scale inference.
“Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers,” Ho said. “We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models. Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hardware’s theoretical limits.”
Broadcom is positioning the project as the start of something much bigger than a one-off chip.
“Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI,” said Hock Tan, President and CEO, Broadcom. “This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt-scale data centers with Microsoft and other partners beginning in 2026.”
OpenAI says Jalapeño is not a repurposed accelerator adapted from older AI workloads. It describes the chip as a blank-slate design for modern LLM inference, built for throughput on large models but with latency low enough for interactive products. That distinction matters. Training chips and inference chips do different jobs, and the companies that win in AI may be the ones that stop treating all compute as interchangeable.
OpenAI’s first custom chip signals a deeper push to control the cost, speed, and scale of AI inference
The chip is part of a bigger full-stack strategy that OpenAI is now making explicit. The company wants to shape more of the path between raw silicon and the end-user experience. That includes chip architecture, serving software, memory systems, networking, scheduling, deployment infrastructure, and the products those systems support. The closer those layers are tied together, the more efficiently OpenAI can run its own models, and the harder it becomes for rivals to match performance on cost.
OpenAI says Jalapeño went from initial design to manufacturing tape-out in nine months, a pace the company describes as one of the fastest ASIC development cycles ever achieved in advanced semiconductors. It credits the speed to close collaboration between OpenAI engineers and Broadcom, as well as the use of OpenAI’s own models to accelerate parts of the chip design and optimization process.
That detail may be one of the more interesting parts of the story. OpenAI is effectively saying that AI helped build the hardware that will run future AI. If that feedback loop holds up, it could have implications well beyond one chip launch. Faster chip development means faster infrastructure iteration. Faster infrastructure iteration means lower compute costs and a shorter path from model research to deployable products.
Jalapeño is expected to be the first building block in a broader compute platform scheduled for initial deployment by the end of 2026. OpenAI says future generations will combine its own accelerator designs with Broadcom’s silicon and networking technology and Celestica’s systems expertise.
For users, none of this will matter if it does not show up in the product. OpenAI’s pitch is that it will. A better inference stack can mean faster ChatGPT responses, AI coding tools that take more steps without stalling, cheaper API access for developers, and more dependable service when demand surges. That is the practical side of the infrastructure race. The chip itself may sit deep inside a data center, but the goal is to make AI feel cheaper, faster, and more available on the surface.
The bigger takeaway is harder to miss. AI leadership is no longer just about who has the smartest model. It is about who can afford to run that model at scale, who can keep latency low, who can survive GPU shortages, and who can keep inference costs from swallowing the business. OpenAI’s first custom chip does not solve all of that overnight. It does show where the company believes the next phase of the AI race will be won.


