Models & Labs

v0.22.1rc1: Docker Update for vLLM

vLLM ReleasesJune 18, 2026high confidence

Why it matters

→Simplifies Docker configuration by removing extra-index-url usage.
→Enhances reliability and efficiency in dependency management.
→Reflects ongoing improvements in vLLM's development process.

vLLM has released version 0.22.1rc1, which includes a notable change in its Docker configuration. The update removes the use of extra-index-url for the flashinfer-jit-cache, aiming to simplify dependency management. This adjustment is expected to improve the reliability of Docker builds for developers using vLLM. The change underscores the project's commitment to refining its development tools and processes.

Read original

More in Models & Labs

Models & Labsmodels

Llama.cpp b9688 Release Enhances Model Management

The latest b9688 release of llama.cpp introduces significant updates to its server capabilities, including a new model management API and real-time SSE updates. These enhancements aim to streamline the deployment and management of AI models, making it easier for developers to integrate and maintain models in various environments. The update also includes a download API and a delete endpoint, providing more control over model assets. While the release doesn't introduce new models, it strengthens the infrastructure, making llama.cpp a more robust choice for developers working with diverse hardware configurations.

llama.cpp ReleasesJun 18, 2026

Models & Labsmodels

Llama.cpp b9689 Release Adds Metal Backend Support

The latest release of llama.cpp, version b9689, enhances its Metal backend by adding support for f16 and bf16 tensor types in the concat operator. This update broadens the compatibility of the Metal backend, which previously supported only f32 and i32 types. By templating the kernel_concat on type T and adding type-specific pipeline getters, the release ensures more efficient processing across different data types. This development is particularly relevant for developers working on macOS and iOS platforms, as it expands the capabilities of AI models running on Apple Silicon and other supported devices.

llama.cpp ReleasesJun 18, 2026

Models & Labsmodels

llama.cpp b9690 release enhances rope_back operator

The b9690 release of llama.cpp introduces a key update with the rope_back operator, which improves kernel efficiency by reusing existing rope kernels. This allows for seamless forward and backward rotation without duplicating code, enhancing overall performance. The release includes support for macOS, Linux, Windows, and openEuler, with configurations like ROCm 7.2 for Ubuntu and CUDA 12 and 13 for Windows. Notably, KleidiAI is disabled for macOS Apple Silicon in this version. This update makes llama.cpp more adaptable and efficient across diverse systems, reinforcing its role as a versatile inference runtime.

llama.cpp ReleasesJun 18, 2026