The b9101 release of llama.cpp has been announced, featuring expanded support across various platforms. This update includes compatibility with macOS, Linux, Windows, and Android, with specific enhancements like Vulkan support on Ubuntu and Windows, and ROCm 7.2 on Ubuntu. Additionally, Windows users benefit from the inclusion of CUDA 12 and 13 DLLs, enhancing GPU performance. This release highlights llama.cpp's ongoing efforts to provide a versatile and comprehensive inference runtime for developers across different systems.
Read originalThe b9094 release of llama.cpp marks a significant expansion in platform support, particularly for macOS and Windows users. With the inclusion of KleidiAI enabled builds for Apple Silicon, macOS users gain enhanced performance without additional configuration. Windows users benefit from the addition of CUDA 12 and 13 support, broadening the scope for GPU-accelerated tasks. This release doesn't introduce new models but focuses on making llama.cpp more accessible and versatile across a wider range of systems, reinforcing its position as a go-to inference runtime for diverse hardware setups.
The latest b9095 release of llama.cpp introduces a significant update with an internal AllReduce kernel for CUDA, eliminating the need for NCCL in certain configurations. This update allows for a single-phase CUDA kernel that efficiently manages data transfer and reduction across GPUs, specifically targeting setups with two GPUs and FP32 tensors up to 256 KB. By providing an alternative to NCCL, this release offers more flexibility and potentially reduces dependencies for developers working with tensor parallelism. The update also includes improvements in error logging and a new watchdog feature to detect and address hangs, enhancing the robustness of the system.
The v0.18.2rc0 release includes a fix for handling the max_pixels parameter in the PaddleOCR-VL image processor across transformations.