First look: OpenAI is taking the wraps off Jalapeño, a custom "intelligence processor" built with Broadcom to make its large language models cheaper and more efficient to run. The company even used its own AI models to help design the chip. Jalapeño is a purpose-built ASIC for inference, rather than for the broader mix of workloads GPUs usually handle.
By designing a chip around how tokens move through transformer architectures, OpenAI is trying to shift away from its heavy dependence on Nvidia hardware and toward a stack it controls end-to-end.
Engineering samples of Jalapeño are already running production-class workloads, including a model called GPT-5.3-Codex-Spark, while meeting the power and performance targets OpenAI set for the project. The company says early testing shows Jalapeño "will deliver performance per watt substantially better than current state-of-the-art," while Broadcom CEO Hock Tan has said it matches Nvidia's Blackwell chips and Google's Tensor Processing Units on performance.
Because the chip is aimed at inference, it can make more aggressive design trade-offs. The architecture is tuned around LLM kernels, memory movement, networking, and serving patterns rather than general-purpose compute, with the goal of improving tokens per watt on the kinds of requests that dominate ChatGPT and API traffic.
The result is less flexibility than a GPU, but potentially much better energy and cost efficiency on a narrow set of workloads that matter most to OpenAI's business.
OpenAI describes Jalapeño as the first step in a "multi-generation compute platform" that it plans to deploy in data centers by the end of 2026 and to expand over several years with partners such as Microsoft.
Broadcom will manufacture the chip and the associated server hardware, while Celestica will assemble the racks. Those systems are intended to be deployed at gigawatt scale with data center partners over multiple generations, starting in 2026.
The development timeline is part of what makes the project notable. OpenAI says Jalapeño went from initial design to tape-out in about nine months, unusually fast for a high-performance ASIC.
Internally, the company used its own models to accelerate parts of the chip design and optimization process, effectively turning generative AI onto the problem of building the silicon that will later host it.
Strategically, Jalapeño is also about reducing exposure to GPU supply constraints and price volatility, even as Nvidia continues to lead in raw performance and ecosystem depth. Tan has said the ASIC can deliver roughly 50% cost improvements versus standard AI GPUs on measures such as cost per kilowatt or cost per token, although neither company has released full public specifications or independent benchmarks.
OpenAI is arriving in a space where other large platforms are already experimenting with in-house silicon. Microsoft, Meta, and Amazon have each introduced custom chips for training or inference. Jalapeño, however, is more tightly coupled to a single provider's model roadmap and hosted services. It is aimed at OpenAI's own infrastructure rather than offered as a general-purpose accelerator like Nvidia GPUs or Google's cloud TPUs.
That choice comes with risk. A chip that is highly tuned for today's LLM architectures can be extremely efficient now, but may be less adaptable if model designs change sharply.
If Jalapeño performs as promised at scale, it could push other AI developers to think more seriously about tightly coupling model and hardware design instead of relying solely on off-the-shelf GPUs. For now, the experiment will unfold inside OpenAI's own data centers, where the company is already testing how far a focused inference ASIC can move the needle on cost, latency, and the overall experience of using large language models.
