vLLM has released version 0.24.0, featuring contributions from 256 developers. This update introduces support for new models such as MiniMax-M3 and DiffusionGemma, alongside performance enhancements for existing models like DeepSeek-V4. The release also includes improvements to the Model Runner V2, which now supports quantized models by default. These updates aim to enhance model deployment and performance, making vLLM a more versatile tool for developers.
Read originalThe latest b9833 release of llama.cpp focuses on refining the MiniCPM5 parser, addressing several technical aspects to improve its functionality. This update includes the addition of a new tool call parser, refactoring of the PEG parser, and adjustments to the Jinja min/max API for better compatibility with Jinja2. The release also reverts some shared mapper changes to maintain strict JSON parsing for tool-call arguments. These enhancements aim to streamline the parsing process, ensuring more reliable and efficient handling of XML tool calls and grammar triggers.
The latest b9835 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. The update also maintains a wide array of builds across macOS, Linux, Windows, and openEuler, ensuring developers have the flexibility to deploy on diverse systems. While the release doesn't introduce groundbreaking changes, it solidifies llama.cpp's position as a versatile tool for AI inference across multiple environments.
The latest b9840 release of llama.cpp introduces significant updates to DeepSeek V4, focusing on conversion and compatibility improvements. Notably, it adds support for the pro model and enhances graph reuse capabilities, which could streamline processes for developers. The update also addresses several bugs and optimizes code by removing redundancies, making the system more efficient. This release doesn't introduce new models but refines existing functionalities, making llama.cpp a more robust tool for AI developers working with diverse architectures.