The b9627 release of llama.cpp has been announced, focusing on expanding platform support across various operating systems. This update includes compatibility with macOS, iOS, Linux, Windows, and openEuler, featuring support for technologies like CUDA, Vulkan, and ROCm. While the release doesn't introduce new model architectures, it enhances the tool's versatility across different hardware setups. This update underscores llama.cpp's commitment to being a comprehensive inference runtime for a wide range of users.
Read originalThe b9622 release of llama.cpp significantly boosts Vulkan capabilities, particularly for non-contiguous unary and glu operations. By refining index calculations with fastdiv and merging unary operations into a single file, the update enhances both performance and code efficiency. It also tackles a compiler bug and resolves earlier conflicts, ensuring smoother functionality across a broad spectrum of hardware setups. While this update doesn't introduce revolutionary features, it strengthens llama.cpp's role as a flexible tool for developers working with diverse hardware, including macOS, Linux, Windows, and openEuler.
The b9624 release of llama.cpp enhances its utility by introducing build-time gzip compression, which can optimize performance through reduced file sizes. This update continues to cater to developers working on various systems, including macOS, Linux, Windows, and openEuler, with specific builds for architectures like arm64 and x64. The inclusion of ROCm 7.2 for Ubuntu x64 and CUDA 12 and 13 for Windows x64 highlights its adaptability to different hardware environments. While there are no new model architectures, the release strengthens llama.cpp's role as a flexible tool for developers needing compatibility across diverse setups.
The latest b9625 release of llama.cpp continues its trend of broadening platform compatibility, though without any groundbreaking new features. Notably, it includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. The release also maintains a wide array of builds across macOS, Linux, Windows, and openEuler, though some configurations like KleidiAI on Apple Silicon remain disabled. While this update doesn't introduce new models or quantization methods, it solidifies llama.cpp's role as a versatile inference runtime across diverse systems.
The vLLM v0.23.0 release marks a significant step forward with enhancements across various components. DeepSeek-V4 has been optimized further, decoupling its metadata from previous versions and adding new attention kernels. Model Runner V2 now supports more dense models by default, improving performance for Llama and Mistral. The Rust frontend has matured with new endpoints and tool parsers, while compatibility with Transformers v5 ensures broader model support. These updates collectively enhance the robustness and versatility of vLLM, making it a more powerful tool for developers working with large language models.
© NVIDIA BlogNVIDIA's Blackwell Ultra NVL72 platform has emerged as a leader in the first agentic AI benchmark, AgentPerf, developed by Artificial Analysis. This benchmark is designed to measure the performance of AI systems handling complex, multi-step tasks, unlike traditional conversational AI benchmarks. The Blackwell platform outperformed others by running 20 times more agents per megawatt than its predecessor, NVIDIA Hopper. This advancement is significant for enterprises deploying AI agents at scale, as it directly impacts infrastructure efficiency and cost-effectiveness.
© Lev SelectorGoogle has introduced DiffusionGemma 26B, a new AI model with advanced capabilities.