Models & Labs

Llama.cpp b9095 Release Enhances CUDA AllReduce

llama.cpp ReleasesMay 11, 2026high confidence

Why it matters

→Provides a NCCL-free alternative for CUDA AllReduce, reducing dependencies.
→Enhances system reliability with improved error logging and hang detection.
→Targets specific GPU configurations, optimizing performance for certain use cases.

Llama.cpp's b9095 release introduces an internal AllReduce kernel for CUDA, offering a NCCL-free implementation for tensor parallelism. This update supports configurations with two GPUs and FP32 tensors up to 256 KB, providing an alternative to NCCL and enhancing flexibility. The release also improves error logging and introduces a watchdog feature to detect hangs, aiming to increase system reliability. These changes are designed to streamline operations and reduce dependencies for developers using llama.cpp in tensor-parallel modes.

Read original

Llama.cpp b9095 Release Enhances CUDA AllReduce

Why it matters

More from llama.cpp Releases

llama.cpp b9094 Release Expands Platform Support

llama.cpp b9097 Release Expands Platform Support

More in Models & Labs

Anthropic Tackles AI Misalignment with New Training

llama.cpp b9100 Release Expands Sampling Support

GitHub to Deprecate Grok Code Fast 1 Model

CyberSecQwen-4B: Specialized Cybersecurity Model Released