Models & Labs

Llama.cpp b9113 Release Adds Q4_1 MoE for Adreno

llama.cpp ReleasesMay 12, 2026high confidence

Why it matters

→Expands AI model execution capabilities on Adreno GPUs.
→Enhances llama.cpp's versatility across multiple platforms.
→Streamlines code for improved performance and efficiency.

Llama.cpp has released its b9113 update, which notably adds support for Q4_1 Mixture of Experts (MoE) on Adreno GPUs. This enhancement is part of a broader effort to optimize AI model execution across various platforms, including macOS, Linux, and Windows. The update also involves code refinements and the removal of unnecessary asserts, contributing to a more streamlined performance. This release underscores llama.cpp's commitment to expanding its compatibility and efficiency across different hardware environments.

Read original

Llama.cpp b9113 Release Adds Q4_1 MoE for Adreno

Why it matters

More from llama.cpp Releases

llama.cpp b9103 Release Expands Platform Support

llama.cpp b9105 Release Enhances CUDA Integration

More in Models & Labs

Thinking Machines unveils interactive AI model

Llama.cpp b9109 Release Enhances Drafting Support

AWS Enhances Foundation Model Training Infrastructure

OpenAI Launches Daybreak for Cybersecurity