The b9493 release of llama.cpp introduces expanded platform support, particularly with the inclusion of ROCm 7.2 for Ubuntu x64, enhancing AMD GPU compatibility. While some features like KleidiAI on macOS Apple Silicon are disabled, the release continues to support a wide range of systems, including Vulkan for Ubuntu and Windows. This update does not bring new models but focuses on improving the tool's versatility across different operating systems. The release solidifies llama.cpp's role as a comprehensive inference runtime for developers beyond the NVIDIA ecosystem.
Read originalThe b9489 release of llama.cpp brings notable improvements for CUDA users, specifically by reserving space for quantized key-value caches at startup. This update also addresses previous feedback and removes certain assertions in the ggml-cuda.cu file, enhancing the CUDA experience. While it doesn't introduce new models or quantization techniques, the release continues to refine the platform's compatibility across macOS, Linux, and Windows. With ROCm 7.2 and KleidiAI support, llama.cpp is becoming a more robust tool for developers working with CUDA and other environments. This iteration is a step towards making llama.cpp a more versatile and efficient tool for AI development.
The latest b9490 release of llama.cpp continues its trend of broadening platform compatibility, though with some notable exceptions. While macOS Apple Silicon users see KleidiAI support disabled, the release strengthens its Linux offerings with Vulkan and ROCm 7.2 support on Ubuntu. Windows users benefit from CUDA 12 and 13 DLLs, enhancing GPU performance options. Despite some features being disabled, this update demonstrates llama.cpp's commitment to being a versatile inference runtime across diverse systems.