Models & Labs

llama.cpp b9010 Release Fixes CUDA Multi-GPU Issue

llama.cpp ReleasesMay 3, 2026high confidence

Why it matters

→Fixes critical multi-GPU support issue, enhancing performance for CUDA users.
→Expands compatibility across various platforms, including Apple Silicon and Vulkan.
→Improves stability and reliability of llama.cpp for diverse hardware setups.

The b9010 release of llama.cpp resolves a significant issue with CUDA device PCI bus ID detection that led to out-of-memory errors by ignoring additional GPUs. This fix enhances multi-GPU support, particularly benefiting users on Windows platforms. The update also includes platform-specific improvements for macOS, Linux, and Windows, with notable support for Apple Silicon and Vulkan. This release focuses on improving stability and compatibility rather than introducing new features.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9008 Release Expands Platform Support

The b9008 release of llama.cpp continues its trend of broadening platform support, making it a versatile tool for developers across various systems. This update includes new builds for macOS, Linux, Windows, and Android, with notable additions like Vulkan support on Ubuntu and Windows, and ROCm 7.2 on Ubuntu. By enhancing compatibility with different architectures, including Apple Silicon and Intel on macOS, and CUDA on Windows, llama.cpp is positioning itself as a go-to runtime for diverse hardware environments. While there are no groundbreaking new features, the release solidifies llama.cpp's role as a flexible and accessible inference tool for developers.

llama.cpp ReleasesMay 3, 2026

Models & Labsother