The b9655 release of llama.cpp focuses on fixing a persistent grammar generator bug that reappeared in recent changes. This update is critical for developers who depend on accurate grammar parsing in their applications. Additionally, the release updates an erroneous case in the PEG parser test, further refining the tool's parsing accuracy. Although no new features are introduced, the release strengthens the existing framework, providing a more stable environment for developers across multiple operating systems.
Read originalThe latest b9653 release of llama.cpp continues its trend of broadening platform compatibility, notably adding Vulkan support for Ubuntu and Windows, and ROCm 7.2 for Ubuntu x64. While KleidiAI support for macOS Apple Silicon is disabled, the release still offers a wide array of builds across macOS, Linux, Windows, and openEuler. This update doesn't introduce new models or quantization methods but focuses on making llama.cpp more accessible across diverse hardware configurations. Developers can now leverage these enhancements to optimize AI inference on a wider range of systems.
The latest b9654 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While KleidiAI support on macOS Apple Silicon is disabled, the release still covers a wide array of systems, including Windows with CUDA 12 and 13 DLLs. This update reinforces llama.cpp's commitment to being a versatile inference runtime across diverse hardware configurations.
The b9658 release of llama.cpp marks another step in broadening its compatibility across different systems, now featuring ROCm 7.2 support on Ubuntu x64. This update continues to offer extensive support for macOS, Windows, and Linux, with specific builds for Vulkan and SYCL. Although there are no new model architectures introduced, the release strengthens llama.cpp's role as a versatile inference runtime for a variety of hardware setups. Developers can now utilize llama.cpp more effectively, leveraging its enhanced platform support to optimize AI development across diverse environments.
The vLLM v0.23.0 release marks a significant step forward with enhancements across various components. DeepSeek-V4 has been optimized further, decoupling its metadata from previous versions and adding new attention kernels. Model Runner V2 now supports more dense models by default, improving performance for Llama and Mistral. The Rust frontend has matured with new endpoints and tool parsers, while compatibility with Transformers v5 ensures broader model support. These updates collectively enhance the robustness and versatility of vLLM, making it a more powerful tool for developers working with large language models.
© NVIDIA BlogNVIDIA's Blackwell Ultra NVL72 platform has emerged as a leader in the first agentic AI benchmark, AgentPerf, developed by Artificial Analysis. This benchmark is designed to measure the performance of AI systems handling complex, multi-step tasks, unlike traditional conversational AI benchmarks. The Blackwell platform outperformed others by running 20 times more agents per megawatt than its predecessor, NVIDIA Hopper. This advancement is significant for enterprises deploying AI agents at scale, as it directly impacts infrastructure efficiency and cost-effectiveness.
© Lev SelectorGoogle has introduced DiffusionGemma 26B, a new AI model with advanced capabilities.