Models & Labs

Llama.cpp b9660 Release Fixes Parsing Issue

llama.cpp ReleasesJune 16, 2026high confidence

Why it matters

→Fixes a critical parsing issue, improving chat functionality reliability.
→Ensures robustness across multiple platforms with new test cases.
→Enhances developer experience by refining existing infrastructure.

The b9660 release of llama.cpp focuses on fixing a parsing issue with LFM2 tool-call double-escaping, which is crucial for chat functionalities. This update includes additional escape test cases to ensure the fix is robust across multiple platforms such as macOS, Linux, and Windows. While the update doesn't bring new features, it enhances the reliability of the existing system, particularly for developers working with various architectures. This release underscores llama.cpp's ongoing efforts to improve platform stability and user experience.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9653 Release Expands Platform Support

The latest b9653 release of llama.cpp continues its trend of broadening platform compatibility, notably adding Vulkan support for Ubuntu and Windows, and ROCm 7.2 for Ubuntu x64. While KleidiAI support for macOS Apple Silicon is disabled, the release still offers a wide array of builds across macOS, Linux, Windows, and openEuler. This update doesn't introduce new models or quantization methods but focuses on making llama.cpp more accessible across diverse hardware configurations. Developers can now leverage these enhancements to optimize AI inference on a wider range of systems.

llama.cpp ReleasesJun 16, 2026

Open Sourcemodels

llama.cpp b9654 Release Expands Platform Support

The latest b9654 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While KleidiAI support on macOS Apple Silicon is disabled, the release still covers a wide array of systems, including Windows with CUDA 12 and 13 DLLs. This update reinforces llama.cpp's commitment to being a versatile inference runtime across diverse hardware configurations.

llama.cpp ReleasesJun 16, 2026

Models & Labsmodels

llama.cpp b9655 Release Fixes Grammar Bug

The b9655 release of llama.cpp resolves a persistent issue with the grammar generator that had re-emerged in recent updates, enhancing the tool's language processing reliability. This fix is crucial for developers who rely on precise grammar parsing in their applications. The update also corrects an erroneous case in the PEG parser test, ensuring more accurate parsing outcomes. While the release doesn't bring new features, it strengthens the existing infrastructure, making llama.cpp a more dependable choice for developers working across different operating systems, including macOS, Linux, and Windows.

llama.cpp ReleasesJun 16, 2026

More in Models & Labs

Models & Labsmodels

vLLM v0.23.0 Release Enhances Model Support

The vLLM v0.23.0 release marks a significant step forward with enhancements across various components. DeepSeek-V4 has been optimized further, decoupling its metadata from previous versions and adding new attention kernels. Model Runner V2 now supports more dense models by default, improving performance for Llama and Mistral. The Rust frontend has matured with new endpoints and tool parsers, while compatibility with Transformers v5 ensures broader model support. These updates collectively enhance the robustness and versatility of vLLM, making it a more powerful tool for developers working with large language models.

vLLM ReleasesJun 14, 2026

Models & Labsmodels

NVIDIA Blackwell Tops Agentic AI Benchmark

NVIDIA's Blackwell Ultra NVL72 platform has emerged as a leader in the first agentic AI benchmark, AgentPerf, developed by Artificial Analysis. This benchmark is designed to measure the performance of AI systems handling complex, multi-step tasks, unlike traditional conversational AI benchmarks. The Blackwell platform outperformed others by running 20 times more agents per megawatt than its predecessor, NVIDIA Hopper. This advancement is significant for enterprises deploying AI agents at scale, as it directly impacts infrastructure efficiency and cost-effectiveness.

NVIDIA BlogJun 12, 2026

Models & Labsmodels

Google DiffusionGemma 26B Unveiled

Google has introduced DiffusionGemma 26B, a new AI model with advanced capabilities.

Lev SelectorJun 12, 2026