The latest b9097 release of llama.cpp introduces expanded support for various platforms, including macOS Apple Silicon with KleidiAI, and multiple Linux configurations such as Ubuntu with Vulkan and ROCm 7.2. Windows users also benefit from enhanced support with CUDA 12 and 13 DLLs. This update focuses on increasing compatibility and performance across diverse hardware environments. While it doesn't introduce new model architectures, it strengthens llama.cpp's role as a versatile tool for developers.
Read originalThe b9094 release of llama.cpp marks a significant expansion in platform support, particularly for macOS and Windows users. With the inclusion of KleidiAI enabled builds for Apple Silicon, macOS users gain enhanced performance without additional configuration. Windows users benefit from the addition of CUDA 12 and 13 support, broadening the scope for GPU-accelerated tasks. This release doesn't introduce new models but focuses on making llama.cpp more accessible and versatile across a wider range of systems, reinforcing its position as a go-to inference runtime for diverse hardware setups.
The latest b9095 release of llama.cpp introduces a significant update with an internal AllReduce kernel for CUDA, eliminating the need for NCCL in certain configurations. This update allows for a single-phase CUDA kernel that efficiently manages data transfer and reduction across GPUs, specifically targeting setups with two GPUs and FP32 tensors up to 256 KB. By providing an alternative to NCCL, this release offers more flexibility and potentially reduces dependencies for developers working with tensor parallelism. The update also includes improvements in error logging and a new watchdog feature to detect and address hangs, enhancing the robustness of the system.
The v0.18.2rc0 release includes a fix for handling the max_pixels parameter in the PaddleOCR-VL image processor across transformations.