Models & Labs

vLLM v0.20.2 Patch Release

vLLM ReleasesMay 29, 2026high confidence

Why it matters

→Bug fixes improve model stability and reliability.
→Enhancements ensure compatibility with existing frameworks.
→Updates address specific operational issues, improving user experience.

vLLM has released version 0.20.2, a small patch update aimed at fixing bugs in DeepSeek V4, gpt-oss, and Qwen3-VL. Key fixes include resolving a hang issue in DeepSeek V4 by re-enabling the persistent topk path and addressing a KV cache allocation error. The update also ensures gpt-oss compatibility with MXFP4 under torch.compile and removes an invalid boundary check in Qwen3-VL. These improvements are designed to enhance model stability and performance.

Read original

vLLM v0.20.2 Patch Release

Why it matters

More in Models & Labs

Llama.cpp adds GLM-5.2 speculative decoding support

Llama.cpp b10178 Release Adds Trace Logging

llama.cpp b10180 Release Enhances SYCL Performance