Llama.cpp has released an update that introduces OpenCL optimizations for Mixture of Experts (MoE) on Adreno GPUs, including a new CLC kernel for MxFP4. The update also addresses several issues such as precision problems, unnecessary headers, and code style improvements. It supports multiple platforms including macOS, Linux, Windows, and Android, ensuring broad compatibility. This update enhances performance and stability for users leveraging Llama.cpp in their applications.
Read originalThe b9002 version of Llama.cpp has been released, supporting multiple platforms.
The b9004 release of llama.cpp introduces support for various platforms including macOS, Linux, Android, and Windows.
The latest update to HMX Flash Attention includes several optimizations and fixes for performance and correctness.