Models & Labs

OpenAI unveils Jalapeño, its first custom AI chip

The Rundown AIJune 25, 2026high confidence

Why it matters

→Jalapeño represents a strategic move for OpenAI to control its compute layer, reducing reliance on external suppliers like Nvidia.
→The rapid development cycle of Jalapeño, aided by AI, sets a new benchmark for ASIC chip design efficiency.
→Owning the hardware allows OpenAI to optimize performance and cost across its AI models and products.

OpenAI unveils Jalapeño, its first custom AI chip — ©The Rundown AI

OpenAI has announced Jalapeño, its first custom AI chip developed with Broadcom, marking a strategic shift towards owning its compute infrastructure. The chip, designed for inference, reportedly offers performance per watt that exceeds current industry standards. Developed in just nine months, Jalapeño showcases OpenAI's ability to rapidly innovate in hardware, with its own AI models playing a key role in the design process. This development could significantly reduce OpenAI's dependency on Nvidia, as the company aims to power 10 GW of compute with custom chips by 2029.

Read original

More in Models & Labs

Models & Labsmodels

Llama.cpp b9784 Release Enhances Hexagon Performance

The latest b9784 release of llama.cpp brings significant optimizations to Hexagon's matrix multiplication capabilities. By reworking the MUL_MAT and MUL_MAT_ID operations, the update introduces a 32x32 tiled weight repack and improved kernel parameters, enhancing performance and efficiency. These changes aim to optimize register usage and streamline activation processing, particularly benefiting users leveraging Hexagon's architecture. This release doesn't introduce new models but focuses on refining existing processes, making llama.cpp more robust for developers working with diverse hardware configurations.

llama.cpp ReleasesJun 26, 2026

Models & Labsmodels

llama.cpp b9788 release enhances dual-GPU support

The latest release of llama.cpp, b9788, introduces significant improvements for dual-GPU setups with SYCL support, particularly enhancing tensor parallelism. By implementing a degenerate ring all-reduce for dual-GPU configurations, the update optimizes performance for both small and large tensor operations, mirroring CUDA's NCCL allreduce pattern. This release notably boosts performance metrics, with Llama-3.3-70B and Qwen3-Coder-Next-80B-A3B models showing substantial speed improvements. The update positions llama.cpp as a more competitive option for multi-GPU environments, without adding new dependencies or altering build configurations.

llama.cpp ReleasesJun 26, 2026

Models & Labsmodels

OpenAI Develops Custom Chip 'Jalapeño'

OpenAI has announced the development of its first custom chip, named 'Jalapeño'.

The AI Daily BriefJun 25, 2026