AI Hardware June 25, 2026 ~18 min read OpenAI Jalapeño

OpenAI × Broadcom Unveil
Jalapeño, Their First Custom AI Chip

~50% cheaper inference · TSMC 3nm · 9-month tape-out · competitive landscape · deployment roadmap

OpenAI and Broadcom Jalapeño custom AI inference ASIC announcement

On June 24, 2026, OpenAI and Broadcom pulled back the curtain on Jalapeño—the company’s first custom AI inference ASIC. Built specifically for large language model (LLM) inference, early lab tests claim roughly 50% lower inference cost versus mainstream AI GPUs, with substantially better performance per watt. The chip is fabricated on TSMC’s 3nm process and is slated for Microsoft Azure and other partner data centers before year-end. This piece covers the backstory, architecture, benchmark caveats, the nine-month development sprint, supply chain, deployment timeline, Nvidia’s moat, industry fallout, key executives, and seven FAQs—plus what it means if you ship code on a VNCMac remote Mac with Codex, OpenClaw, or Xcode.

01

Why OpenAI Decided to Build Its Own Silicon

OpenAI is among the world’s largest GPU buyers. Every ChatGPT query triggers inference—the model reading your prompt and generating a response—across sprawling server fleets. As GPT-4 and GPT-5-class models grow more capable, inference has become the heaviest line item on OpenAI’s path to sustainable margins.

Until now, OpenAI ran almost entirely on Nvidia hardware for both training and serving. H100s, H200s, and Blackwell GPUs are formidable—but they are general-purpose accelerators, not purpose-built for the repetitive matrix math of transformer inference. In a workload this homogeneous, a lot of GPU capacity is effectively overhead. Think of it this way: Nvidia sells a Swiss Army knife; Jalapeño is a scalpel.

Everyone Else Got There First

CompanyCustom ChipPrimary Use
GoogleTPU (Tensor Processing Unit)Training + inference
AmazonTrainium (training) / Inferentia (inference)Training + inference
MicrosoftMaia 100Inference
MetaMTIAInference
OpenAIJalapeño (2026)Inference

OpenAI is the last hyperscaler to ship custom silicon—but it moved fast. Nine months from initial design to tape-out, the partners claim, is the fastest advanced ASIC cycle on record in high-performance semiconductors.

02

What Jalapeño Actually Is

2.1 An ASIC, Not a GPU

ASIC (Application-Specific Integrated Circuit) means the die does one job: run LLM inference. No gaming, no training runs, no general compute. That narrow focus is the entire point—when the workload is fixed, efficiency skyrockets.

“Jalapeño was designed from a blank slate for LLM inference, incorporating our deep understanding of frontier models across kernel execution, memory movement, networking, and serving patterns.”—Richard Ho, OpenAI hardware lead

2.2 Architecture Highlights

  1. 01

    Blank-slate design: Not a retread of an old GPU blueprint. Every block is sized for modern transformer inference patterns.

  2. 02

    Minimized data movement: Inference bottlenecks often sit in memory bandwidth, not raw FLOPs. Jalapeño trims useless shuffling between SRAM and compute.

  3. 03

    Balanced compute, memory, and network: Tuned to real LLM serving loads so utilization stays closer to theoretical peaks.

  4. 04

    Broadcom Tomahawk interconnect: Cluster-scale node-to-node bandwidth for multi-chip inference on the largest models.

  5. 05

    Celestica board and rack integration: The EMS partner turns bare dies into production server boards and rack systems at volume.

2.3 Process Node and Lab Validation

  • Foundry: TSMC 3nm—same generation as Apple M4 and Nvidia Blackwell
  • Lab workloads: Engineering samples already run at target clock and power in OpenAI’s labs, including GPT-5.3-Codex-Spark, the company’s flagship coding inference model
03

Performance and Cost: The Numbers (With Caveats)

Read carefully: Figures below come from Broadcom CEO Hock Tan and OpenAI press materials. They reflect early internal testing. A full technical report is months away, and no independent benchmark has validated them yet.

MetricJalapeño (early tests)Baseline
Inference cost savings~50%vs. current mainstream AI GPUs
Performance per wattSubstantially above state of the artPer OpenAI statements
Absolute throughputComparable to Nvidia Blackwell and Google TPUHock Tan (Reuters)
Thermal behaviorBetter than expectedOpenAI internal testing

Speaking to Bloomberg, Hock Tan said Jalapeño has shown “roughly 50% cost savings compared to typical AI GPUs” in testing so far. OpenAI president Greg Brockman added that the chip went from initial design to tape-out in nine months—and that OpenAI’s own AI models assisted parts of the design and optimization workflow.

Treat the 50% figure as a vendor lab claim until three things happen: OpenAI publishes a technical report, Microsoft and other partners run production workloads, and third-party benchmarks (MLPerf, etc.) reproduce the results.

04

Nine Months to Tape-Out: How They Moved This Fast

Jalapeño reached manufacturing tape-out in nine months. OpenAI and Broadcom call that the fastest advanced ASIC development cycle on record for this class of silicon.

  1. 01

    Hardware–software co-design: Model teams and chip architects worked in the same loop, avoiding the classic trap of hardware engineers guessing what software will need six months later.

  2. 02

    AI-assisted chip design: OpenAI fed its own models into layout and optimization decisions—VentureBeat reported prior-generation models handled parts of the flow.

  3. 03

    Broadcom’s IP library: Reusable blocks for implementation and networking (including Tomahawk) collapsed the path from RTL to physical design.

05

Supply Chain and Partners

RoleCompanyResponsibility
Architecture & co-designOpenAILLM inference optimization, full-stack architecture
Silicon implementation & networkingBroadcomDie bring-up, Tomahawk fabric, volume support
Wafer fabricationTSMC3nm manufacturing
System integrationCelesticaMotherboards, racks, server integration at scale
First deployment partnerMicrosoft AzureData-center rollout (starting late 2026)
06

Deployment Roadmap

Near term (late 2026)

  • Engineering samples already running in OpenAI labs
  • Commercial deployment to Microsoft Azure and other partner data centers before year-end
  • Priority workloads: ChatGPT, Codex, and API inference inside OpenAI

Mid term (2027)

  • Volume production ramps; served inference grows materially
  • Broadcom projects deployed capacity exceeding 1.3 gigawatts (GW)
  • Possible external access for other AI companies (official copy positions Jalapeño as built for industry-wide LLMs)

Long term (through 2029)

  • OpenAI targets 10 GW of compute on custom silicon—roughly the output of ten nuclear plants
  • Multi-generation roadmap in place; next chip expected 2028, then annual refreshes
  • Training-focused silicon remains a future possibility (Jalapeño v1 is inference-only)
07

Nvidia’s Moat: Diversification, Not Divorce

Can Jalapeño replace Nvidia? Not in the near term.

  1. 01

    Inference only, not training: Frontier model training still runs on Nvidia GPUs. In February 2026, Nvidia made a $30 billion direct investment in OpenAI—the two are competitors and partners at once.

  2. 02

    CUDA ecosystem: Fifteen years of developer tooling is Nvidia’s deepest moat. Jalapeño does not plug into that stack today.

  3. 03

    ASIC inflexibility: If transformer architectures shift radically, retooling a fixed-function chip is expensive and slow.

The real strategy: leverage, not abandonment

Even if Jalapeño handles just 20–30% of OpenAI’s inference, that is real savings and real negotiating power on Nvidia purchase orders. Google, Amazon, and Microsoft play the same game: not dumping Nvidia, but refusing to be 100% dependent on it.

“Nobody wants to be beholden to Nvidia.”—Ben Barringer, global technology research lead, Quilter Cheviot

How Nvidia and Broadcom respond

Nvidia counters with the Vera Rubin platform, CUDA, and that $30B OpenAI tie-up. Broadcom, meanwhile, is becoming the custom ASIC kingmaker—designing silicon for Google (TPU v5/v6), Meta (MTIA), and now OpenAI (Jalapeño). Broadcom shares are up roughly 18% year-to-date in 2026; since late 2022 the stock has climbed nearly 7×.

08

What This Means for the AI Industry

  1. 01

    Inference economics reshape business models: If 50% savings hold in production, API prices can fall further, OpenAI’s unit economics improve, and the floor of the AI price war drops again.

  2. 02

    Full-stack AI is the new bar: OpenAI now touches chip architecture, kernels, memory, networking, schedulers, deployment, and product. Competition is shifting from “whose model is best” to “whose stack is most efficient end to end.”

  3. 03

    Semiconductor winners and losers: Broadcom, TSMC, and HBM suppliers (SK hynix, Samsung) benefit. Nvidia faces gradual inference share erosion; AMD feels pressure on the GPU side too.

09

Key People

NameTitleRole in Jalapeño
Greg BrockmanOpenAI co-founder & presidentPublic launch; framed as full-stack infrastructure strategy
Richard HoOpenAI hardware leadTechnical architecture leadership
Hock TanBroadcom CEOClaimed Blackwell-class performance, ~50% cost savings
Sam AltmanOpenAI CEOStrategic push to own the compute stack (has said OpenAI should control its silicon destiny)
10

Timeline

timeline
Oct 2025        →  OpenAI and Broadcom announce custom chip partnership
Feb 2026        →  Nvidia invests $30B in OpenAI (incl. Vera Rubin capacity deal)
Jun 24, 2026    →  Jalapeño unveiled publicly; engineering samples in lab
Late 2026       →  First commercial deploy (Microsoft Azure + partner DCs)
2027            →  Volume production; deployment exceeds 1.3 GW
2028 (est.)     →  Second-generation Jalapeño chip
2029 (target)   →  10 GW compute scale on custom silicon
FAQ

Frequently Asked Questions

Not yet. Jalapeño handles LLM inference only—not training. Nvidia’s training dominance is secure for the foreseeable future. The two chips are complementary, not interchangeable.

It is early lab data from Broadcom CEO Hock Tan in a Bloomberg interview. No third party has verified it. Expect a fuller technical report in the coming months—treat it as a directional claim, not a settled fact.

If savings translate to production, ChatGPT and API calls could get cheaper and snappier. Over time, AI becomes more affordable and widely available—even if the silicon itself stays invisible.

OpenAI has not explained the codename. The company often names projects after food. Jalapeño may nod to sharp performance or the heat this announcement added to the chip wars.

Official messaging says the chip is built for current and future LLMs across the industry—hinting at eventual external access. For now, OpenAI’s own inference queue comes first.

Broadcom and OpenAI have a multi-generation roadmap. The next chip is targeted for 2028, with yearly iterations planned after that.

Nvidia shares barely reacted. Markets view training as safe territory for now. The longer-term risk is structural: every hyperscaler building custom inference silicon chips away at GPU demand.

Closing

Jalapeño is not the silver bullet that ends Nvidia’s reign—but it is real silicon, already running GPT-5.3-Codex-Spark in the lab, and it marks the moment when buying all your compute from the highest bidder stops being the only option. OpenAI joins Google, Amazon, Microsoft, and Meta in the custom-chip club. The goal is leverage and cost control, not a clean break from Nvidia. If the 50% number survives production, AI economics shift in a meaningful way.

For developers, the near-term upside is cheaper, faster Codex and ChatGPT APIs. Your day-to-day work—writing code on a Mac, running Xcode, shipping OpenClaw agents—does not vanish because inference got cheaper. Full-stack AI splits into two parallel tracks: cloud silicon optimized for serving, and local or remote Mac environments for building and validating agents. If your primary machine is Windows or Linux and you need to test Codex Spark or OpenClaw GUI flows on real macOS, VNCMac remote Mac + VNC is still the shortest path. Use the button below to spin up an M4 node in under 30 minutes.