The b9128 release of llama.cpp introduces optimizations for Hexagon, focusing on eliminating scalar VTCM loads through HVX splat helpers. This update also enhances support for macOS, including Apple Silicon with KleidiAI enabled, and extends compatibility across multiple platforms such as Windows and Linux. Key improvements include optimized per-group scale handling and slope load from VTCM. These enhancements aim to boost performance and efficiency, making llama.cpp more adaptable for developers working with various hardware setups.
Read originalThe latest b9116 release of llama.cpp introduces MiMo v2.5, enhancing vision support with fused qkv for improved performance. This update addresses previous issues like f16 vision overflow and includes various cleanups for better code maintenance. With expanded platform support, including macOS, Linux, and Windows, this release broadens accessibility for developers working on diverse systems. The focus on vision capabilities marks a significant step in making llama.cpp a more versatile tool for AI developers, particularly those interested in integrating vision functionalities.
The latest b9118 release of llama.cpp continues its trend of broadening platform compatibility, now including support for a wide array of systems such as macOS, Linux, Windows, and Android. Notably, this update introduces Vulkan support on Ubuntu and Windows, alongside ROCm 7.2 for AMD GPUs, which is a significant step for users seeking alternatives to NVIDIA's CUDA. The inclusion of KleidiAI on Apple Silicon further enhances performance for M-series Macs. While there are no new model architectures, this release solidifies llama.cpp's position as a versatile inference runtime across diverse hardware configurations.