The b9389 release of llama.cpp has been announced, focusing on expanding platform support across various operating systems. Notably, macOS Apple Silicon sees KleidiAI support disabled, while Linux users gain from ROCm 7.2 and Vulkan support. Windows platforms receive updated CUDA 12 and 13 DLLs, enhancing GPU performance. Despite some features being disabled, this release highlights llama.cpp's ongoing efforts to cater to a wide range of hardware configurations.
Read originalThe latest b9387 release of llama.cpp introduces significant performance improvements for AMD MFMA hardware, particularly in quantized matrix multiplication. By optimizing the batch threshold logic, the update allows for more efficient processing, with throughput gains of up to 76% in certain configurations. This release is particularly relevant for users leveraging AMD's MI250X hardware, as it fine-tunes the kernel selection logic to maximize performance. While the update doesn't introduce new models, it significantly enhances the efficiency of existing operations on specific hardware, making it a noteworthy development for those using AMD GPUs.
The latest b9388 release of llama.cpp introduces optimizations for Turing architecture, specifically adding MMVQ_PARAMETERS_TURING to improve JIT compilation for SM75 Turing devices. This update aims to prevent mismatches when compiling Turing device code on Ampere or newer architectures. While the release doesn't introduce new models or quantization methods, it continues to expand platform support, including updates for macOS, Linux, and Windows. The focus remains on refining compatibility and performance across diverse hardware configurations, making llama.cpp a more versatile tool for developers.
The b9391 release of llama.cpp continues to broaden its platform support, making it more accessible to a diverse range of users. Notably, this update includes support for Ubuntu x64 with ROCm 7.2, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While some features like KleidiAI on macOS Apple Silicon and SYCL FP32 on Ubuntu are disabled, the release still marks a step forward in making llama.cpp a versatile tool across different operating systems. This update doesn't introduce new models but enhances the existing infrastructure, ensuring more users can leverage llama.cpp's capabilities.
Hugging Face has introduced a fully local speech processing setup for the Reachy Mini robot, eliminating the need for cloud services and enhancing privacy. By utilizing a cascaded voice pipeline, users can run speech-to-speech interactions entirely on their own hardware, ensuring that no data leaves their network. This setup leverages components like llama.cpp for LLM and Parakeet-TDT for STT, allowing for customizable and cost-effective speech processing. The move empowers users with full control over their speech processing pipeline, offering flexibility to swap components as new models become available.
© Lev SelectorAndrej Karpathy has released CLAUDE md as open source.
© Matt WolfeStability AI has launched Stable Audio 3.0, a model family designed for artistic experimentation with open-weight models.