16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Models & Labs
Models & Labs

vLLM v0.20.2 Patch Release

vLLM Releases·May 29, 2026·high confidence

Why it matters

  • →Bug fixes improve model stability and reliability.
  • →Enhancements ensure compatibility with existing frameworks.
  • →Updates address specific operational issues, improving user experience.

vLLM has released version 0.20.2, a small patch update aimed at fixing bugs in DeepSeek V4, gpt-oss, and Qwen3-VL. Key fixes include resolving a hang issue in DeepSeek V4 by re-enabling the persistent topk path and addressing a KV cache allocation error. The update also ensures gpt-oss compatibility with MXFP4 under torch.compile and removes an invalid boundary check in Qwen3-VL. These improvements are designed to enhance model stability and performance.

Read original

More in Models & Labs

Models & Labsmodels

Llama.cpp b9387 Release Enhances AMD MFMA Performance

The latest b9387 release of llama.cpp introduces significant performance improvements for AMD MFMA hardware, particularly in quantized matrix multiplication. By optimizing the batch threshold logic, the update allows for more efficient processing, with throughput gains of up to 76% in certain configurations. This release is particularly relevant for users leveraging AMD's MI250X hardware, as it fine-tunes the kernel selection logic to maximize performance. While the update doesn't introduce new models, it significantly enhances the efficiency of existing operations on specific hardware, making it a noteworthy development for those using AMD GPUs.

llama.cpp Releases·May 29, 2026
Models & Labsmodels

llama.cpp b9388 release enhances Turing support

The latest b9388 release of llama.cpp introduces optimizations for Turing architecture, specifically adding MMVQ_PARAMETERS_TURING to improve JIT compilation for SM75 Turing devices. This update aims to prevent mismatches when compiling Turing device code on Ampere or newer architectures. While the release doesn't introduce new models or quantization methods, it continues to expand platform support, including updates for macOS, Linux, and Windows. The focus remains on refining compatibility and performance across diverse hardware configurations, making llama.cpp a more versatile tool for developers.

llama.cpp Releases·May 29, 2026
Models & Labsmodels

llama.cpp b9394 Release Expands Platform Support

The b9394 release of llama.cpp continues to broaden its platform compatibility, though some configurations remain unavailable. This update includes support for Ubuntu with ROCm 7.2 and Windows with CUDA 12 and 13, enhancing performance on these systems. However, certain features like macOS with KleidiAI and SYCL on Windows are still disabled, indicating areas where development is ongoing. This release aims to make llama.cpp a more versatile inference runtime across various hardware, though achieving full feature parity remains a work in progress. Users on supported platforms can expect improved performance, while others may need to wait for future updates to see complete functionality.

llama.cpp Releases·May 29, 2026