The b9566 release of llama.cpp introduces improvements in buffer management, particularly for SWA-only draft heads. This update ensures that each kq_mask buffer is independently guarded, preventing null assertions during load. The release continues to support a wide range of platforms, including macOS, Linux, Windows, and openEuler, with configurations for Vulkan, ROCm, and CUDA. Some features remain disabled, but the focus on stability and reliability is evident in this update.
Read originalThe b9561 release of llama.cpp continues to enhance its platform reach, adding Vulkan support for Ubuntu and Windows, and ROCm 7.2 for Ubuntu, which is a significant boost for AMD GPU users. While features like KleidiAI on macOS and SYCL on Windows remain inactive, this update reinforces llama.cpp's role as a flexible inference runtime across various systems. Although no new models are introduced, the release focuses on strengthening the existing infrastructure, making it more adaptable for developers working with different hardware setups. This ongoing expansion of capabilities ensures that llama.cpp remains a vital tool for AI inference across a broad spectrum of environments.
The latest b9562 release of llama.cpp introduces video input support, marking a significant step in expanding its capabilities. This update includes a new mtmd_helper_video feature and allows video input on servers via base64 encoding. The CLI has been updated to support video arguments, enhancing user interaction. While the release doesn't introduce new models, it broadens the scope of llama.cpp by integrating video processing, making it more versatile for developers working with multimedia inputs.