Open Source

llama.cpp b9728 Release Expands Platform Support

llama.cpp ReleasesJune 20, 2026high confidence

Why it matters

→Expands compatibility across diverse hardware platforms, enhancing usability for developers.
→Focuses on stability by disabling certain features, indicating a cautious approach to new capabilities.
→Reinforces llama.cpp's role as a versatile inference runtime for non-NVIDIA hardware.

The b9728 release of llama.cpp has been announced, featuring expanded support across multiple platforms. Notably, the release includes support for Ubuntu with ROCm 7.2 and Vulkan, as well as Windows with CUDA 12 and 13. However, the KleidiAI feature for macOS Apple Silicon is disabled, suggesting a focus on stability. This update highlights llama.cpp's ongoing efforts to enhance compatibility across various hardware configurations, though it remains cautious in adding new features.

Read original

More from llama.cpp Releases

Open Sourcecoding

llama.cpp b9724 Release with Bug Fixes

The b9724 release of llama.cpp is all about enhancing stability through a series of bug fixes, including improvements to build processes and overflow prevention in the area() function. This update ensures smoother operations across macOS, Windows, and Ubuntu, with specific support for Vulkan and ROCm 7.2 on Ubuntu. While it doesn't introduce groundbreaking features, the release strengthens llama.cpp's reliability as a tool for developers working in diverse environments. By refining and optimizing the platform, this update makes llama.cpp a more robust choice for AI development, ensuring compatibility with CUDA 12 and 13 on Windows and KleidiAI on Apple Silicon.

llama.cpp ReleasesJun 20, 2026

Models & Labsmodels

llama.cpp b9726 Release Adds New Features

The b9726 release of llama.cpp enhances server functionality with a new --agent argument, making command-line operations more efficient. By removing redundant web UI naming compatibility, the update simplifies the codebase. This release extends support to macOS, Linux, Windows, and openEuler, with specific improvements for AMD GPUs through ROCm 7.2 and NVIDIA GPUs with CUDA 12 and 13. While no new models are introduced, the update focuses on refining the platform's adaptability and ease of use for developers working in diverse computing environments.

llama.cpp ReleasesJun 20, 2026

Models & Labsmodels

llama.cpp b9731 Release Optimizes Token Sorting

The b9731 release of llama.cpp delivers a crucial optimization in how token probabilities are calculated. By adopting std::partial_sort, the system now efficiently sorts only the top-n tokens, cutting operation time from 8555.6 microseconds to 704.3 microseconds per operation. This enhancement is implemented across macOS, Linux, and Windows, improving performance for developers working with large language models. The update doesn't introduce new features but focuses on refining existing capabilities, such as KleidiAI on Apple Silicon and ROCm 7.2 on Ubuntu. This release underscores llama.cpp's commitment to making its core functionalities more efficient, particularly for those leveraging CUDA 12 and 13 on Windows.

llama.cpp ReleasesJun 20, 2026

More in Open Source

Open Sourcecoding

Kimi K2.7 and GLM-5.2 Models Released

Moonshot AI and Zhipu AI have released new open weight coding models, Kimi K2.7 and GLM-5.2.

Lev SelectorJun 19, 2026

Open Sourceagents

Anthropic Releases Open Source Tool for AI Agents

Anthropic has launched a new open-source tool called Claude Code, designed to simplify the creation of AI agents. This tool allows users to build and deploy AI agents without needing to write code or manage servers, making it accessible to a broader audience. The process involves an interactive setup that defines success criteria and schedules tasks, all managed in the cloud. This release could democratize AI agent development, enabling more people to experiment and innovate with AI technologies without technical barriers.

Duncan RogoffJun 19, 2026

Open Sourcecoding

GitHub Limits Open Pull Requests for Non-Writers

GitHub has introduced a new feature allowing repository maintainers to set a cap on the number of open pull requests from users without write access. This change aims to streamline the management of contributions by reducing the clutter of low-quality or drive-by pull requests. Maintainers can also designate trusted contributors who can exceed this limit without needing full collaborator access. This update is designed to help maintainers focus on meaningful contributions and reduce unnecessary review and CI overhead.

GitHub ChangelogJun 17, 2026