Models & Labs

llama.cpp b9012 Release Enhances Mistral Format Support

llama.cpp ReleasesMay 4, 2026high confidence

Why it matters

→Enhances support for Mistral format, improving model deployment flexibility.
→Corrects boolean parameter issues, increasing script reliability.
→Expands platform compatibility, supporting a wide range of hardware configurations.

The b9012 release of llama.cpp focuses on improving support for the Mistral format, specifically the apply_scale feature. This update corrects previous issues with boolean parameters in the conversion script, enhancing usability for developers. The release supports a wide array of platforms, including macOS, Linux, and Windows, ensuring broad compatibility. While no new models are introduced, the update strengthens llama.cpp's infrastructure, making it a more reliable tool for AI model deployment across diverse environments.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9009 Release Expands Platform Support

The latest b9009 release of llama.cpp continues its trend of broadening platform compatibility, now including support for macOS Apple Silicon with KleidiAI enabled and various Linux distributions with Vulkan and ROCm 7.2. This update refines the server's efficiency by avoiding unnecessary checkpoint data host copies, which could enhance performance. While the release doesn't introduce new model architectures, it solidifies llama.cpp's position as a versatile inference runtime across diverse systems. Developers can now leverage these improvements to optimize AI applications on a wider range of hardware configurations.

llama.cpp ReleasesMay 4, 2026

Open Sourcecoding

b9014 Release Adds Layer Norm Ops to ggml-webgpu

The b9014 release of llama.cpp enhances ggml-webgpu by integrating layer normalization operations, boosting its shader functionality. This update stabilizes floating point computations with Kahan summation, though it later reverts to the original method for improved efficiency. By eliminating non-contiguous strides, the release optimizes performance on platforms like macOS with KleidiAI, Ubuntu with ROCm 7.2, and Windows with CUDA 12 and 13. These changes make llama.cpp more adaptable and efficient for developers working with a range of hardware setups.

llama.cpp ReleasesMay 4, 2026

Open Sourcemodels

llama.cpp b9008 Release Expands Platform Support

The b9008 release of llama.cpp continues its trend of broadening platform support, making it a versatile tool for developers across various systems. This update includes new builds for macOS, Linux, Windows, and Android, with notable additions like Vulkan support on Ubuntu and Windows, and ROCm 7.2 on Ubuntu. By enhancing compatibility with different architectures, including Apple Silicon and Intel on macOS, and CUDA on Windows, llama.cpp is positioning itself as a go-to runtime for diverse hardware environments. While there are no groundbreaking new features, the release solidifies llama.cpp's role as a flexible and accessible inference tool for developers.

llama.cpp ReleasesMay 3, 2026

More in Models & Labs

Models & Labsmodels

vLLM v0.20.2rc0 introduces shutdown() method

The latest release of vLLM, version 0.20.2rc0, brings a new shutdown() method, enhancing the control developers have over the lifecycle of their applications. This addition is a practical improvement for those managing resources and ensuring clean exits in their AI systems. While it may seem like a small update, it reflects a focus on robustness and reliability in AI infrastructure. Developers can now better manage their applications, reducing potential issues during shutdown processes.

vLLM ReleasesMay 4, 2026

Models & Labsmodels

DeepSeek V4 Pro Launches with 1.6T Parameters

DeepSeek V4 Pro is a new AI model with 1.6 trillion parameters.

Lev SelectorMay 1, 2026

Models & Labsmodels

DeepSeek V4 Preview Released

DeepSeek has launched a preview of its V4 model.

Matt WolfeMay 1, 2026