Models & Labs

Llama.cpp b9688 Release Enhances Model Management

llama.cpp ReleasesJune 18, 2026high confidence

Why it matters

→The new model management API simplifies the integration and maintenance of AI models.
→Real-time SSE updates enhance the responsiveness and interactivity of applications using llama.cpp.
→The release strengthens llama.cpp's infrastructure, making it more versatile for developers.

Llama.cpp has released its b9688 update, focusing on server-side improvements. This release introduces a model management API and real-time SSE updates, enhancing the framework's capability to handle AI models efficiently. Additional features include a download API and a delete endpoint, offering developers more control over their model assets. These updates are designed to improve the deployment and management of AI models across various platforms, although no new models are included in this release.

Read original

More from llama.cpp Releases

Open Sourcecoding

llama.cpp b9684 Release Adds 3D Convolution

The b9684 release of llama.cpp marks a significant enhancement with the integration of 3D convolution, boosting its ability to handle complex data processing tasks. This update also brings optimizations and a cleaner codebase, enhancing overall efficiency. The release extends support across a broad spectrum of platforms, including macOS, Linux, and Windows, with specific configurations like Vulkan, ROCm, and SYCL. By expanding its platform compatibility and functionality, llama.cpp becomes an even more versatile tool for developers tackling diverse AI challenges.

llama.cpp ReleasesJun 18, 2026

Open Sourcecoding

llama.cpp b9685 Release Enhances SYCL Support

The b9685 release of llama.cpp brings notable advancements in SYCL support, particularly with the addition of device-to-device memory copy via the SYCL API. This update also refines the detection method for peer-to-peer communication, resolving previous conflicts. While there are no new model architectures introduced, the release enhances the platform's adaptability across macOS, Linux, and Windows. With ROCm 7.2 support on Ubuntu and CUDA 12 and 13 DLLs for Windows, llama.cpp becomes a more robust choice for developers working with diverse hardware configurations. The inclusion of KleidiAI on Apple Silicon further optimizes performance for M-series Macs. These improvements make llama.cpp a more versatile tool for developers.

llama.cpp ReleasesJun 18, 2026

Open Sourcemodels

llama.cpp b9686 Release Expands Platform Support

The b9686 release of llama.cpp focuses on enhancing compatibility across a wide array of systems, though it doesn't introduce major new features. This update includes ROCm 7.2 support on Ubuntu x64, providing a significant boost for AMD GPU users who prefer alternatives to NVIDIA's CUDA. Developers can now utilize llama.cpp on various configurations, including macOS, Linux, Windows, and openEuler, ensuring they have the tools needed for AI inference tasks. While the release lacks groundbreaking changes, it strengthens llama.cpp's reputation as a flexible and accessible tool for AI developers working on different hardware setups.

llama.cpp ReleasesJun 18, 2026

More in Models & Labs

Models & Labsmodels

v0.22.1rc1: Docker Update for vLLM

The latest release candidate for vLLM, version 0.22.1rc1, introduces a change in the Docker setup by removing the use of extra-index-url for the flashinfer-jit-cache. This adjustment simplifies the Docker configuration, potentially reducing dependency management issues and improving build reliability. While this update might seem minor, it reflects ongoing efforts to streamline the development process and enhance the usability of vLLM for developers. This change is particularly relevant for those maintaining Docker environments and looking for more efficient ways to manage dependencies.

vLLM ReleasesJun 18, 2026

Models & Labsmodels

MolmoMotion: New Model for 3D Motion Forecasting

MolmoMotion is a breakthrough in 3D motion forecasting, offering a new way to predict object trajectories based on video frames and language instructions. By using a sparse set of 3D points attached to objects, it efficiently forecasts motion without rendering full video, making it highly applicable for robotics and video generation. The release includes the MolmoMotion-1M dataset, the largest of its kind, and the PointMotionBench benchmark for accuracy testing. This model sets a new standard in motion prediction, outperforming existing methods and opening new possibilities for AI-driven applications.

Hugging Face BlogJun 17, 2026

Models & Labsmodels

GLM-5.2 Enhances Long-Horizon Coding Tasks

GLM-5.2 marks a significant step forward in handling long-horizon coding tasks with its robust 1M-token context capability. By introducing IndexShare, the model reduces computational demands while maintaining high performance across extended contexts. This release positions GLM-5.2 as a leading open-source model, outperforming its predecessor and closing the gap with proprietary models on key benchmarks. The model's ability to balance performance with computational cost through effort level control offers users flexibility in managing complex coding tasks. This advancement makes GLM-5.2 a practical tool for sustained engineering work, particularly in scenarios requiring extensive context handling.

Hugging Face BlogJun 17, 2026