Models & Labs

vLLM v0.19.0 Released with New Features

vLLM ReleasesMay 1, 2026high confidence

Why it matters

→The addition of Gemma 4 support expands the capabilities of vLLM for advanced AI applications. • Performance improvements in async scheduling and model handling can significantly enhance throughput for developers. • Compatibility with HuggingFace Transformers v5 ensures broader usability across popular AI frameworks.

vLLM has announced the release of version 0.19.0, which includes 448 commits from 197 contributors, with 54 new contributors. Key features include full support for Google Gemma 4 architecture, zero-bubble async scheduling with speculative decoding, and enhancements to the Model Runner V2. This update also introduces compatibility with HuggingFace Transformers v5 and various optimizations for NVIDIA hardware. The release aims to improve throughput and performance for a range of models and applications.

Read original

vLLM v0.19.0 Released with New Features

Why it matters

More from vLLM Releases

vLLM Releases v0.18.2rc0 Update

v0.19.0rc0 Release with CPU KV Cache Offloading

More in Models & Labs

New release of llama.cpp b8991

llama.cpp update enhances compatibility and performance

v0.19.0rc1 Release Announcement

ChatGPT Images 2.0 Gains Popularity in India