The b9134 release of llama.cpp has been announced, featuring expanded support across multiple platforms. Notable updates include macOS Apple Silicon with KleidiAI enabled, Vulkan and ROCm 7.2 support on Ubuntu, and updated CUDA 12 and 13 DLLs for Windows. This release does not introduce new models but enhances compatibility and performance across various hardware configurations. The update underscores llama.cpp's commitment to providing a versatile inference runtime for developers.
Read originalThe b9129 release of llama.cpp introduces an adaptive fallback feature for the ggml-zendnn backend, which optimizes performance by switching to the CPU for small batch sizes. This feature is enabled by default, but developers can control it using a new runtime environment variable, allowing them to revert to the original fallback logic if desired. The update supports platforms like macOS with KleidiAI, Windows with CUDA 12 and 13, and Ubuntu with ROCm 7.2, ensuring efficient processing across different systems. This release highlights llama.cpp's focus on enhancing performance and flexibility for developers working with various hardware configurations.
The latest b9133 release of llama.cpp introduces significant improvements for reasoning models, particularly in server and web UI environments. By removing the blocking assistant prefill and orchestrating thinking tags, the update ensures smoother continuation of generation tasks. This release also drops the reasoning guard on the Continue button, allowing for persistent reasoning content even after reloads. While the update focuses on templates with simple thinking tags, it sets the stage for future enhancements in reasoning model capabilities.
The v0.18.2rc0 release includes a fix for handling the max_pixels parameter in the PaddleOCR-VL image processor across transformations.