Models & Labs

Llama.cpp b9158 Release Enhances AMD Support

llama.cpp ReleasesMay 15, 2026high confidence

Why it matters

→Enhances performance for AMD users by optimizing RDNA3 support.
→Allows efficient operation with larger head sizes, improving versatility.
→Strengthens llama.cpp's adaptability across different hardware configurations.

The b9158 release of llama.cpp introduces RDNA3 support to the CUDA mma FA kernel, optimizing performance for AMD users. This update allows RDNA3 tensor cores to work efficiently with FP16 accumulation, particularly for head sizes that align with the new tile configurations. Additionally, kernel parameters have been tuned for RDNA3, RDNA4, and CDNA1, enabling better performance for larger head sizes on CDNA. These enhancements make llama.cpp more adaptable and efficient across various hardware setups, especially benefiting those using AMD technology.

Read original

Llama.cpp b9158 Release Enhances AMD Support

Why it matters

More from llama.cpp Releases

llama.cpp b9145 release addresses SYCL memory issues

Llama.cpp Adds Qwen3.5 Tokenizer Handler

More in Models & Labs

GitHub API Adds Team-Level Copilot Metrics

llama.cpp b9150 Release Expands Platform Support

OpenAI's Codex Now Available on Mobile

GPT-5.5 Emerges as Versatile AI Model