Open Source

b9014 Release Adds Layer Norm Ops to ggml-webgpu

llama.cpp ReleasesMay 4, 2026high confidence

Why it matters

→Enhances shader capabilities with layer normalization operations.
→Improves computation stability and performance across platforms.
→Optimizes llama.cpp for diverse hardware configurations.

The latest b9014 release of llama.cpp focuses on enhancing the ggml-webgpu component by adding layer normalization operations. This update includes improvements in floating point computation stability through Kahan summation, although it ultimately opts for the original implementation for better performance. The release also optimizes shader operations by removing non-contiguous strides. These changes aim to improve the efficiency and versatility of llama.cpp across multiple platforms, including macOS, Linux, and Windows.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9009 Release Expands Platform Support

The latest b9009 release of llama.cpp continues its trend of broadening platform compatibility, now including support for macOS Apple Silicon with KleidiAI enabled and various Linux distributions with Vulkan and ROCm 7.2. This update refines the server's efficiency by avoiding unnecessary checkpoint data host copies, which could enhance performance. While the release doesn't introduce new model architectures, it solidifies llama.cpp's position as a versatile inference runtime across diverse systems. Developers can now leverage these improvements to optimize AI applications on a wider range of hardware configurations.

llama.cpp ReleasesMay 4, 2026

Models & Labsmodels

llama.cpp b9012 Release Enhances Mistral Format Support

The b9012 release of llama.cpp marks a significant enhancement in handling the Mistral format, particularly with the apply_scale feature, which now functions more reliably thanks to fixes in boolean parameter handling. Developers can now leverage this update across a variety of platforms, including macOS, Linux, and Windows, ensuring compatibility with diverse hardware setups like Apple Silicon and Vulkan. By refining the conversion script, llama.cpp strengthens its infrastructure, making it a more robust tool for AI model deployment. While no new models are introduced, the update focuses on improving the existing framework, enhancing its adaptability and reliability for developers.

llama.cpp ReleasesMay 4, 2026

Open Sourcemodels

llama.cpp b9008 Release Expands Platform Support

The b9008 release of llama.cpp continues its trend of broadening platform support, making it a versatile tool for developers across various systems. This update includes new builds for macOS, Linux, Windows, and Android, with notable additions like Vulkan support on Ubuntu and Windows, and ROCm 7.2 on Ubuntu. By enhancing compatibility with different architectures, including Apple Silicon and Intel on macOS, and CUDA on Windows, llama.cpp is positioning itself as a go-to runtime for diverse hardware environments. While there are no groundbreaking new features, the release solidifies llama.cpp's role as a flexible and accessible inference tool for developers.

llama.cpp ReleasesMay 3, 2026