The b9434 release of llama.cpp has been announced, focusing on fixing granularity for Qwen 3.5/3.6 across three GPUs. This update is primarily a technical adjustment aimed at improving performance and compatibility for specific GPU configurations. The release includes support for various operating systems such as macOS, Linux, and Windows, though no new models or major features have been introduced. This update underscores llama.cpp's commitment to refining its platform for developers, ensuring it remains a versatile tool in the AI development landscape.
Read originalLlama.cpp has addressed a critical issue in its device selection logic that affected systems using integrated GPUs as their main compute device. Previously, the presence of any RPC server would cause the local iGPU to be ignored, leading to model loading failures. This update ensures that iGPUs are included unless no GPUs are available, allowing for proper tensor allocation and model loading on systems like the Strix Halo with significant unified memory. This fix enhances the reliability of llama.cpp on diverse hardware configurations.
The b9428 release of llama.cpp significantly enhances its platform support, addressing key issues and expanding compatibility. This update fixes the s390x release job and introduces multi-thread build capabilities for iOS-Xcode, improving performance. It also broadens support for macOS, Linux, and Windows, with specific enhancements like Vulkan and ROCm 7.2 on Ubuntu, and CUDA on Windows. While some features like KleidiAI on macOS remain disabled, the release demonstrates a commitment to making llama.cpp more accessible and versatile for developers working across different systems.
The latest b9430 release of llama.cpp introduces LSX support, optimizing performance for LoongArch architectures. By implementing native intrinsics for fp16 load/store operations and adding LSX implementations for various dot products, the update enhances computational efficiency. This release also includes improvements for macOS, Linux, and Windows platforms, with specific enhancements for Apple Silicon and Vulkan support. While some features remain disabled, the update signifies a step forward in making llama.cpp more versatile across different hardware configurations.
The vLLM v0.22.0 release marks a significant step forward in model performance and infrastructure. With 459 commits from 230 contributors, this update introduces major enhancements like the DeepSeek V4 model's reorganization and NVFP4 fused MoE support, which improve accuracy and efficiency. The Model Runner V2 now defaults to Qwen3 dense models, offering better performance with new features like sleep-mode weight reload. Additionally, the introduction of a Rust frontend and batch-invariant inference improvements highlight the release's focus on speed and flexibility. These updates collectively enhance the vLLM framework's capability to handle complex AI tasks more efficiently.
© The AI Daily BriefOpenAI has released an update to GPT-5.5 Instant, enhancing its capabilities.
© Lev SelectorClaude Opus 4.8 has been released as the new default model, featuring a fast mode and dynamic ultra-code workflows.