Models & Labs

llama.cpp b9747 release enhances model tracking

llama.cpp ReleasesJune 22, 2026high confidence

Why it matters

→Real-time model load tracking improves user experience by providing immediate feedback.
→Enhanced platform support makes llama.cpp more versatile for developers.
→Technical improvements like mutex addition ensure smoother server operations.

The b9747 release of llama.cpp focuses on improving user experience with the addition of real-time model load progress tracking via the server's /models/sse endpoint. This update also includes technical enhancements such as a mutex for notify_to_router, aimed at optimizing server operations. The release supports a wide range of platforms, including macOS, Linux, and Windows, though some features like KleidiAI on Apple Silicon are disabled. These changes make llama.cpp more adaptable for developers across various systems, enhancing its utility without introducing new model architectures.

Read original

More from llama.cpp Releases

Models & Labsmodels

llama.cpp b9745 Release Enhances MTP Support

The latest b9745 release of llama.cpp introduces significant enhancements in multi-threaded processing (MTP) support, particularly with the addition of Step3.5/3.7 flash MTP3. This update includes new APIs like llama_set_mtp_layer_offset and llama_model_n_nextn_layer, which aim to improve the efficiency of multi-head processing. The release also addresses various platform-specific builds, including support for macOS, Linux, Windows, and openEuler, ensuring broader compatibility. While the update doesn't introduce new models, it refines the existing infrastructure, making llama.cpp more robust for developers working with diverse hardware configurations.

llama.cpp ReleasesJun 22, 2026

Open Sourcemodels

llama.cpp b9748 release expands platform support

The latest b9748 release of llama.cpp continues its trend of broadening platform compatibility, notably adding support for ROCm 7.2 on Ubuntu x64. This update ensures that AMD GPU users can leverage llama.cpp more effectively, narrowing the gap with NVIDIA's CUDA. The release also includes Vulkan support on several operating systems, enhancing performance options for developers. While there are no groundbreaking new features, this update solidifies llama.cpp's position as a versatile inference runtime across diverse hardware configurations.

llama.cpp ReleasesJun 22, 2026

Open Sourcemodels

llama.cpp b9750 Release Expands Platform Support

The latest b9750 release of llama.cpp continues its trend of broadening platform compatibility, notably with the inclusion of ROCm 7.2 for Ubuntu x64, which enhances support for AMD GPUs. This update also refines the codebase by implementing a call statement and simplifying certain functions, which could improve performance and maintainability. While KleidiAI support for macOS Apple Silicon is disabled, the release still offers a wide array of builds across macOS, Linux, Windows, and openEuler. This iteration doesn't introduce new models but strengthens llama.cpp's position as a versatile inference runtime across diverse hardware configurations.

llama.cpp ReleasesJun 22, 2026

More in Models & Labs

Models & Labsmodels

NVIDIA Unveils 100% Liquid-Cooled AI Servers

NVIDIA has introduced a groundbreaking AI server infrastructure that operates entirely on liquid cooling, setting a new standard for energy efficiency in data centers. By allowing cooling liquids to reach temperatures as high as 45 degrees Celsius, these servers significantly reduce energy and water consumption, addressing one of the largest operational costs in data centers. This innovation not only cuts down on the need for mechanical chillers and fans but also opens up possibilities for waste heat recovery. The Rubin generation servers promise to transform data center operations, especially in climates where traditional cooling methods are less efficient.

NVIDIA BlogJun 22, 2026

Models & Labsmodels

Claude Fable 5 Withdrawn Amid Negotiations

Claude Fable 5 was released and then withdrawn as Anthropic negotiates access with the administration.

Lev SelectorJun 19, 2026

Models & Labsmodels

OpenRouter Fusion Combines Models to Reduce Costs

OpenRouter Fusion uses model ensembles to reduce hallucinations and improve accuracy while lowering costs.

Lev SelectorJun 19, 2026