
OpenAI has announced Jalapeño, its first custom AI processor developed with Broadcom, designed for AI inference tasks. The ASIC chip aims to power large language models like ChatGPT, reducing reliance on Nvidia's GPUs. Jalapeño reportedly matches the performance of Nvidia's Blackwell chips and Google's Tensor units. This development is part of OpenAI's strategy to create a multi-generation compute platform by 2026, with early tests indicating improved performance per watt.
Read originalThe latest b9784 release of llama.cpp brings significant optimizations to Hexagon's matrix multiplication capabilities. By reworking the MUL_MAT and MUL_MAT_ID operations, the update introduces a 32x32 tiled weight repack and improved kernel parameters, enhancing performance and efficiency. These changes aim to optimize register usage and streamline activation processing, particularly benefiting users leveraging Hexagon's architecture. This release doesn't introduce new models but focuses on refining existing processes, making llama.cpp more robust for developers working with diverse hardware configurations.
The latest release of llama.cpp, b9788, introduces significant improvements for dual-GPU setups with SYCL support, particularly enhancing tensor parallelism. By implementing a degenerate ring all-reduce for dual-GPU configurations, the update optimizes performance for both small and large tensor operations, mirroring CUDA's NCCL allreduce pattern. This release notably boosts performance metrics, with Llama-3.3-70B and Qwen3-Coder-Next-80B-A3B models showing substantial speed improvements. The update positions llama.cpp as a more competitive option for multi-GPU environments, without adding new dependencies or altering build configurations.
© The AI Daily BriefOpenAI has announced the development of its first custom chip, named 'Jalapeño'.