The b9430 release of llama.cpp brings new LSX support, particularly benefiting LoongArch architectures with optimized fp16 load/store operations. This update includes LSX implementations for several dot products, enhancing performance. Additionally, the release covers improvements across macOS, Linux, and Windows platforms, with notable support for Apple Silicon and Vulkan. Despite some features being disabled, this update marks a significant enhancement in llama.cpp's adaptability to various hardware environments.
Read originalLlama.cpp has addressed a critical issue in its device selection logic that affected systems using integrated GPUs as their main compute device. Previously, the presence of any RPC server would cause the local iGPU to be ignored, leading to model loading failures. This update ensures that iGPUs are included unless no GPUs are available, allowing for proper tensor allocation and model loading on systems like the Strix Halo with significant unified memory. This fix enhances the reliability of llama.cpp on diverse hardware configurations.
The b9428 release of llama.cpp significantly enhances its platform support, addressing key issues and expanding compatibility. This update fixes the s390x release job and introduces multi-thread build capabilities for iOS-Xcode, improving performance. It also broadens support for macOS, Linux, and Windows, with specific enhancements like Vulkan and ROCm 7.2 on Ubuntu, and CUDA on Windows. While some features like KleidiAI on macOS remain disabled, the release demonstrates a commitment to making llama.cpp more accessible and versatile for developers working across different systems.
The b9431 release of llama.cpp brings targeted updates to its build processes, particularly enhancing the iOS-Xcode release job by moving to macOS-26. This update also involves disabling the libcommon build from the xcframework, which may indicate a strategic optimization. On the Windows side, the release includes updates for CUDA 12 and CUDA 13 DLLs, ensuring the software remains compatible with the latest GPU advancements. While no new features are introduced, these changes reflect a commitment to refining performance and maintaining compatibility with current technologies across different operating systems.
© Lev SelectorCohere has open-sourced its Command A+ model, making it accessible for public use.
Hugging Face has introduced a fully local speech processing setup for the Reachy Mini robot, eliminating the need for cloud services and enhancing privacy. By utilizing a cascaded voice pipeline, users can run speech-to-speech interactions entirely on their own hardware, ensuring that no data leaves their network. This setup leverages components like llama.cpp for LLM and Parakeet-TDT for STT, allowing for customizable and cost-effective speech processing. The move empowers users with full control over their speech processing pipeline, offering flexibility to swap components as new models become available.
© Lev SelectorAndrej Karpathy has released CLAUDE md as open source.