Models & Labs

Llama.cpp b9689 Release Adds Metal Backend Support

llama.cpp ReleasesJune 18, 2026high confidence

Why it matters

→Expands Metal backend capabilities to support more tensor types.
→Enhances AI model performance on Apple Silicon devices.
→Increases versatility of llama.cpp across different platforms.

Llama.cpp has released version b9689, which introduces significant updates to its Metal backend. The concat operator now supports f16 and bf16 tensor types, in addition to the existing f32 and i32. This enhancement is achieved by templating the kernel_concat on type T and adding type-specific pipeline getters. The update is particularly beneficial for macOS and iOS developers, as it improves AI model performance on Apple Silicon devices. This release marks a step forward in making llama.cpp more versatile across different platforms and data types.

Read original

More from llama.cpp Releases

Open Sourcecoding

llama.cpp b9724 Release with Bug Fixes

The b9724 release of llama.cpp is all about enhancing stability through a series of bug fixes, including improvements to build processes and overflow prevention in the area() function. This update ensures smoother operations across macOS, Windows, and Ubuntu, with specific support for Vulkan and ROCm 7.2 on Ubuntu. While it doesn't introduce groundbreaking features, the release strengthens llama.cpp's reliability as a tool for developers working in diverse environments. By refining and optimizing the platform, this update makes llama.cpp a more robust choice for AI development, ensuring compatibility with CUDA 12 and 13 on Windows and KleidiAI on Apple Silicon.

llama.cpp ReleasesJun 20, 2026

Models & Labsmodels

llama.cpp b9726 Release Adds New Features

The b9726 release of llama.cpp enhances server functionality with a new --agent argument, making command-line operations more efficient. By removing redundant web UI naming compatibility, the update simplifies the codebase. This release extends support to macOS, Linux, Windows, and openEuler, with specific improvements for AMD GPUs through ROCm 7.2 and NVIDIA GPUs with CUDA 12 and 13. While no new models are introduced, the update focuses on refining the platform's adaptability and ease of use for developers working in diverse computing environments.

llama.cpp ReleasesJun 20, 2026

Open Sourcemodels

llama.cpp b9728 Release Expands Platform Support

The latest b9728 release of llama.cpp continues its trend of broadening platform compatibility, though with some notable exceptions. While macOS Apple Silicon support is present, the KleidiAI feature is disabled, indicating a focus on stability over new features. The release also includes support for a variety of Linux distributions, including Ubuntu with ROCm 7.2 and Vulkan, as well as Windows with CUDA 12 and 13. This update highlights llama.cpp's commitment to being a versatile inference runtime across diverse hardware, though it remains conservative in introducing new capabilities.

llama.cpp ReleasesJun 20, 2026