16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Models & Labs
Models & Labs

llama.cpp b9025 Release Expands Platform Support

llama.cpp Releases·May 5, 2026·high confidence

Why it matters

  • →Expands llama.cpp's compatibility across diverse platforms, enhancing its utility for developers.
  • →Introduces Vulkan and ROCm 7.2 support, improving GPU performance options.
  • →Positions llama.cpp as a versatile runtime for AI applications across various hardware configurations.

The b9025 release of llama.cpp has been announced, featuring expanded support across multiple platforms. This update includes Vulkan support for both Ubuntu and Windows, as well as ROCm 7.2 for Ubuntu, enhancing GPU performance capabilities. While no new models are introduced, the release focuses on broadening compatibility, making llama.cpp a versatile option for developers working on various hardware configurations. This positions llama.cpp as a flexible tool for developers seeking to implement AI solutions across different systems.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9015 Release Expands Platform Support

The b9015 release of llama.cpp marks another step in expanding its reach across diverse systems, now including macOS Apple Silicon with KleidiAI enabled and Ubuntu with ROCm 7.2. This update also brings Vulkan support to both Linux and Windows, enhancing the software's versatility. Windows users benefit from CUDA 12 and 13 support, ensuring compatibility with the latest NVIDIA technologies. While the release doesn't introduce new model architectures, it strengthens llama.cpp's role as a flexible inference runtime for developers working with varied hardware configurations.

llama.cpp Releases·May 5, 2026
Models & Labsmodels

llama.cpp b9018 release expands platform support

The b9018 release of llama.cpp continues its trend of broadening platform compatibility, now supporting a wide array of systems including macOS, Linux, Windows, and Android. Notably, it introduces Vulkan support on Ubuntu and Windows, and adds ROCm 7.2 for AMD GPUs, which is a significant step for users seeking alternatives to NVIDIA's CUDA. This release doesn't bring new models or quantization methods, but it solidifies llama.cpp's position as a versatile inference runtime across diverse hardware configurations. Users can now leverage these enhancements to optimize performance on their specific setups.

llama.cpp Releases·May 5, 2026
Models & Labsmodels

llama.cpp b9019 Release Enhances Model Flexibility

The b9019 release of llama.cpp brings notable changes by relocating functions like load_hparams and load_tensors to be defined per model, enhancing the flexibility for developers. This structural shift is complemented by the introduction of build_graph and refined switch case logic, which collectively improve the system's modularity. These updates facilitate easier adaptation to various hardware setups, including macOS, Linux, and Windows environments. Although no new model architectures are introduced, the release sets a foundation for more efficient development and deployment, particularly with support for configurations like KleidiAI on Apple Silicon and ROCm 7.2 on AMD GPUs.

llama.cpp Releases·May 5, 2026

More in Models & Labs

Google unveils major AI advancements at Cloud Next '26© Google AI Blog
Models & Labsmodels

Google unveils major AI advancements at Cloud Next '26

Google's Cloud Next '26 event showcased significant advancements in AI, emphasizing the 'agentic era' with the launch of the Gemini Enterprise Agent Platform and eighth-generation TPUs. These innovations aim to enhance business operations and energy efficiency in data centers. The introduction of Gemma 4, an open model for advanced reasoning, and Deep Research Max, which automates high-level research tasks, marks a leap in AI capabilities. Additionally, Google Vids now offers free video generation, democratizing access to professional-quality content creation. These developments highlight Google's commitment to integrating AI into diverse sectors, from education to enterprise solutions.

Google AI Blog·May 4, 2026
Gemini API Introduces Webhooks for Long-Running Jobs© Google AI Blog
Models & Labsagents

Gemini API Introduces Webhooks for Long-Running Jobs

Google's Gemini API now supports event-driven Webhooks, significantly reducing friction and latency for long-running tasks. This new feature allows developers to receive real-time notifications when a job is completed, eliminating the need for continuous polling. The implementation adheres to the Standard Webhooks specification, ensuring secure and reliable communication with features like signed requests and automatic retries. This advancement makes it easier for developers to manage complex workflows, such as deep research or batch processing, with greater efficiency.

Google AI Blog·May 4, 2026
Models & Labsmodels

vLLM v0.20.2rc0 introduces shutdown() method

The latest release of vLLM, version 0.20.2rc0, brings a new shutdown() method, enhancing the control developers have over the lifecycle of their applications. This addition is a practical improvement for those managing resources and ensuring clean exits in their AI systems. While it may seem like a small update, it reflects a focus on robustness and reliability in AI infrastructure. Developers can now better manage their applications, reducing potential issues during shutdown processes.

vLLM Releases·May 4, 2026