The b9073 release of llama.cpp has been announced, featuring expanded support across multiple platforms. This update includes builds for macOS Apple Silicon with KleidiAI, Ubuntu with ROCm 7.2, and Windows with CUDA 12 and 13. The release aims to enhance compatibility and performance across a variety of hardware configurations. By broadening its platform support, llama.cpp continues to be a flexible tool for developers working with AI inference.
Read originalThe b9075 release of llama.cpp brings a notable improvement for CUDA users by integrating the snake activation function into a single elementwise kernel. This enhancement is particularly advantageous for audio decoders like BigVGAN and Vocos, which previously depended on a more complex five-operation sequence. By streamlining these operations, the update promises better performance and efficiency across data types such as F32, F16, and BF16. This development reflects llama.cpp's ongoing focus on refining its CUDA capabilities, making it a more compelling option for developers dealing with complex activation functions.
The latest b9076 release of llama.cpp quietly expands its platform support, making it more versatile for developers across various systems. Notably, it now exposes child model information from the router's /v1/models endpoint, enhancing transparency and control for users. The update includes support for macOS Apple Silicon with KleidiAI enabled, as well as expanded compatibility with Ubuntu and Windows systems, including Vulkan and ROCm 7.2. This release doesn't introduce new models but strengthens llama.cpp's position as a flexible inference runtime across diverse hardware configurations.
The v0.18.2rc0 release includes a fix for handling the max_pixels parameter in the PaddleOCR-VL image processor across transformations.