The b9204 release of llama.cpp has been announced, featuring support for d_conv=15 in the ssm-conv.cu module. This update is part of the ModalityConditionalAdapters branch and enhances the framework's capabilities for developers. The release maintains compatibility with a broad array of platforms, including macOS, Linux, and Windows, and supports various hardware configurations such as Apple Silicon and Vulkan. This update focuses on improving the technical infrastructure rather than introducing new models.
Read originalThe latest b9296 release of llama.cpp continues its trend of broadening platform compatibility, making it a versatile tool for developers across various systems. Notably, this update includes support for macOS Apple Silicon with KleidiAI enabled, and expands its reach on Windows with CUDA 12 and 13 DLLs. The inclusion of ROCm 7.2 for Ubuntu x64 further enhances its utility for AMD GPU users. While there are no groundbreaking new features, the release solidifies llama.cpp's position as a go-to runtime for diverse hardware configurations, ensuring developers can leverage its capabilities across a wide array of environments.
The b9297 release of llama.cpp brings a notable enhancement with the introduction of NVFP4 MTP scale tensors, boosting its tensor processing capabilities. This update also integrates Qwen3.5 MTP tensors, which improves performance across a spectrum of hardware configurations, including Apple Silicon, Vulkan, and ROCm on Ubuntu, as well as CUDA on Windows. The release supports a wide array of architectures, from macOS to Linux and Windows, ensuring compatibility with both CPU and GPU setups. While there are no new model architectures, the inclusion of KleidiAI on Apple Silicon and ROCm 7.2 on Ubuntu highlights llama.cpp's commitment to optimizing for diverse environments. This update reinforces llama.cpp's role as a flexible inference runtime, catering to a broad range of hardware setups.
© The AI Daily BriefOpenAI has made a significant advancement in mathematical capabilities within its AI models.
© Matt WolfeGoogle has released Gemini 3.5 Flash, a faster and more cost-effective AI model, with a Pro version coming soon.