
Anthropic's Claude models are now generally available on Microsoft Azure, utilizing NVIDIA's GB300 Blackwell Ultra GPUs. This setup allows enterprises to create more powerful AI agents with improved performance and efficiency. The collaboration between NVIDIA and Anthropic enhances developer capabilities by integrating NVIDIA tools, enabling domain-specific abilities for Claude agents. This development is part of a strategic partnership with Microsoft, NVIDIA, and Anthropic, aimed at expanding enterprise access to advanced AI solutions.
Read originalThe vLLM v0.24.0 release marks a significant update with extensive contributions from 256 developers, introducing support for new models like MiniMax-M3 and DiffusionGemma. This version enhances performance with optimizations such as the FlashInfer sparse index cache and improved throughput for DeepSeek-V4. The update also expands the Model Runner V2 capabilities, supporting quantized models by default and integrating GraniteMoE. These advancements make vLLM more robust and versatile, offering developers improved tools for model deployment and performance tuning.
The latest b9833 release of llama.cpp focuses on refining the MiniCPM5 parser, addressing several technical aspects to improve its functionality. This update includes the addition of a new tool call parser, refactoring of the PEG parser, and adjustments to the Jinja min/max API for better compatibility with Jinja2. The release also reverts some shared mapper changes to maintain strict JSON parsing for tool-call arguments. These enhancements aim to streamline the parsing process, ensuring more reliable and efficient handling of XML tool calls and grammar triggers.
The latest b9835 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. The update also maintains a wide array of builds across macOS, Linux, Windows, and openEuler, ensuring developers have the flexibility to deploy on diverse systems. While the release doesn't introduce groundbreaking changes, it solidifies llama.cpp's position as a versatile tool for AI inference across multiple environments.