Models & Labs

llama.cpp b9655 Release Fixes Grammar Bug

llama.cpp ReleasesJune 16, 2026high confidence

Why it matters

→Fixing the grammar generator bug improves language processing reliability.
→The update enhances the accuracy of the PEG parser test.
→Developers benefit from a more stable and dependable tool across platforms.

The b9655 release of llama.cpp focuses on fixing a persistent grammar generator bug that reappeared in recent changes. This update is critical for developers who depend on accurate grammar parsing in their applications. Additionally, the release updates an erroneous case in the PEG parser test, further refining the tool's parsing accuracy. Although no new features are introduced, the release strengthens the existing framework, providing a more stable environment for developers across multiple operating systems.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9653 Release Expands Platform Support

The latest b9653 release of llama.cpp continues its trend of broadening platform compatibility, notably adding Vulkan support for Ubuntu and Windows, and ROCm 7.2 for Ubuntu x64. While KleidiAI support for macOS Apple Silicon is disabled, the release still offers a wide array of builds across macOS, Linux, Windows, and openEuler. This update doesn't introduce new models or quantization methods but focuses on making llama.cpp more accessible across diverse hardware configurations. Developers can now leverage these enhancements to optimize AI inference on a wider range of systems.

llama.cpp ReleasesJun 16, 2026

Open Sourcemodels

llama.cpp b9654 Release Expands Platform Support

The latest b9654 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While KleidiAI support on macOS Apple Silicon is disabled, the release still covers a wide array of systems, including Windows with CUDA 12 and 13 DLLs. This update reinforces llama.cpp's commitment to being a versatile inference runtime across diverse hardware configurations.

llama.cpp ReleasesJun 16, 2026

Open Sourcemodels

llama.cpp b9658 Release Expands Platform Support

The b9658 release of llama.cpp marks another step in broadening its compatibility across different systems, now featuring ROCm 7.2 support on Ubuntu x64. This update continues to offer extensive support for macOS, Windows, and Linux, with specific builds for Vulkan and SYCL. Although there are no new model architectures introduced, the release strengthens llama.cpp's role as a versatile inference runtime for a variety of hardware setups. Developers can now utilize llama.cpp more effectively, leveraging its enhanced platform support to optimize AI development across diverse environments.

llama.cpp ReleasesJun 16, 2026

More in Models & Labs

Models & Labsmodels

vLLM v0.23.0 Release Enhances Model Support

The vLLM v0.23.0 release marks a significant step forward with enhancements across various components. DeepSeek-V4 has been optimized further, decoupling its metadata from previous versions and adding new attention kernels. Model Runner V2 now supports more dense models by default, improving performance for Llama and Mistral. The Rust frontend has matured with new endpoints and tool parsers, while compatibility with Transformers v5 ensures broader model support. These updates collectively enhance the robustness and versatility of vLLM, making it a more powerful tool for developers working with large language models.

vLLM ReleasesJun 14, 2026

Models & Labsmodels

NVIDIA Blackwell Tops Agentic AI Benchmark

NVIDIA's Blackwell Ultra NVL72 platform has emerged as a leader in the first agentic AI benchmark, AgentPerf, developed by Artificial Analysis. This benchmark is designed to measure the performance of AI systems handling complex, multi-step tasks, unlike traditional conversational AI benchmarks. The Blackwell platform outperformed others by running 20 times more agents per megawatt than its predecessor, NVIDIA Hopper. This advancement is significant for enterprises deploying AI agents at scale, as it directly impacts infrastructure efficiency and cost-effectiveness.

NVIDIA BlogJun 12, 2026

Models & Labsmodels

Google DiffusionGemma 26B Unveiled

Google has introduced DiffusionGemma 26B, a new AI model with advanced capabilities.

Lev SelectorJun 12, 2026