The b9568 release of llama.cpp brings new support for gemma-4 E2B and E4B assistants, enhancing the model's adaptability. Key updates include the addition of masked_embd tensors to the gemma4-assist architecture and the removal of temporary debug features to streamline conversions. The release maintains broad platform support, with notable compatibility for Vulkan and ROCm 7.2 on Ubuntu. While some features remain disabled, this update reflects llama.cpp's focus on improving model conversion processes.
Read originalThe b9561 release of llama.cpp continues to enhance its platform reach, adding Vulkan support for Ubuntu and Windows, and ROCm 7.2 for Ubuntu, which is a significant boost for AMD GPU users. While features like KleidiAI on macOS and SYCL on Windows remain inactive, this update reinforces llama.cpp's role as a flexible inference runtime across various systems. Although no new models are introduced, the release focuses on strengthening the existing infrastructure, making it more adaptable for developers working with different hardware setups. This ongoing expansion of capabilities ensures that llama.cpp remains a vital tool for AI inference across a broad spectrum of environments.
The latest b9562 release of llama.cpp introduces video input support, marking a significant step in expanding its capabilities. This update includes a new mtmd_helper_video feature and allows video input on servers via base64 encoding. The CLI has been updated to support video arguments, enhancing user interaction. While the release doesn't introduce new models, it broadens the scope of llama.cpp by integrating video processing, making it more versatile for developers working with multimedia inputs.