The b9116 release of llama.cpp brings MiMo v2.5, focusing on enhancing vision support with features like fused qkv. This update resolves issues such as f16 vision overflow and includes code cleanups for improved maintenance. The release supports a wide range of platforms, including macOS, Linux, and Windows, making it accessible to developers across different systems. This enhancement in vision capabilities positions llama.cpp as a more versatile tool for AI developers.
Read originalThe latest b9118 release of llama.cpp continues its trend of broadening platform compatibility, now including support for a wide array of systems such as macOS, Linux, Windows, and Android. Notably, this update introduces Vulkan support on Ubuntu and Windows, alongside ROCm 7.2 for AMD GPUs, which is a significant step for users seeking alternatives to NVIDIA's CUDA. The inclusion of KleidiAI on Apple Silicon further enhances performance for M-series Macs. While there are no new model architectures, this release solidifies llama.cpp's position as a versatile inference runtime across diverse hardware configurations.
The b9119 release of llama.cpp focuses on fixing a performance regression for Intel GPU BF16 workloads on Windows, specifically targeting Xe2 and newer models. This update ensures that users on these platforms experience improved performance, particularly when using Vulkan. The release also includes a refactor to optimize the use of l_warptile only when coopamt is available for BF16, enhancing efficiency. While the update doesn't introduce new models or groundbreaking features, it solidifies llama.cpp's commitment to maintaining and improving performance across diverse hardware configurations.