The latest b9119 release of llama.cpp addresses a performance regression issue on Windows for Intel GPU BF16 workloads, particularly affecting Xe2 and newer models. This update is crucial for users relying on Vulkan, as it restores expected performance levels. Additionally, the release includes a refactor to optimize the use of l_warptile, ensuring it is only used when coopamt is available for BF16. This release underscores llama.cpp's ongoing efforts to enhance performance across various hardware platforms.
Read originalThe latest b9116 release of llama.cpp introduces MiMo v2.5, enhancing vision support with fused qkv for improved performance. This update addresses previous issues like f16 vision overflow and includes various cleanups for better code maintenance. With expanded platform support, including macOS, Linux, and Windows, this release broadens accessibility for developers working on diverse systems. The focus on vision capabilities marks a significant step in making llama.cpp a more versatile tool for AI developers, particularly those interested in integrating vision functionalities.
The latest b9118 release of llama.cpp continues its trend of broadening platform compatibility, now including support for a wide array of systems such as macOS, Linux, Windows, and Android. Notably, this update introduces Vulkan support on Ubuntu and Windows, alongside ROCm 7.2 for AMD GPUs, which is a significant step for users seeking alternatives to NVIDIA's CUDA. The inclusion of KleidiAI on Apple Silicon further enhances performance for M-series Macs. While there are no new model architectures, this release solidifies llama.cpp's position as a versatile inference runtime across diverse hardware configurations.