16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Models & Labs
Models & Labs

llama.cpp b9018 release expands platform support

llama.cpp Releases·May 5, 2026·high confidence

Why it matters

  • →Expands platform support, making llama.cpp more versatile for developers.
  • →Introduces Vulkan and ROCm 7.2 support, enhancing performance on non-NVIDIA hardware.
  • →Strengthens llama.cpp's position as a flexible inference runtime across diverse systems.

The latest b9018 release of llama.cpp has been announced, featuring expanded support across multiple platforms including macOS, Linux, Windows, and Android. Key additions include Vulkan support on both Ubuntu and Windows, as well as ROCm 7.2 for AMD GPUs, enhancing compatibility for non-NVIDIA hardware. This update does not introduce new models but focuses on broadening the software's applicability across various systems. The release underscores llama.cpp's commitment to being a flexible inference tool for developers working with different hardware configurations.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9015 Release Expands Platform Support

The b9015 release of llama.cpp marks another step in expanding its reach across diverse systems, now including macOS Apple Silicon with KleidiAI enabled and Ubuntu with ROCm 7.2. This update also brings Vulkan support to both Linux and Windows, enhancing the software's versatility. Windows users benefit from CUDA 12 and 13 support, ensuring compatibility with the latest NVIDIA technologies. While the release doesn't introduce new model architectures, it strengthens llama.cpp's role as a flexible inference runtime for developers working with varied hardware configurations.

llama.cpp Releases·May 5, 2026
Models & Labsmodels

llama.cpp b9019 Release Enhances Model Flexibility

The b9019 release of llama.cpp brings notable changes by relocating functions like load_hparams and load_tensors to be defined per model, enhancing the flexibility for developers. This structural shift is complemented by the introduction of build_graph and refined switch case logic, which collectively improve the system's modularity. These updates facilitate easier adaptation to various hardware setups, including macOS, Linux, and Windows environments. Although no new model architectures are introduced, the release sets a foundation for more efficient development and deployment, particularly with support for configurations like KleidiAI on Apple Silicon and ROCm 7.2 on AMD GPUs.

llama.cpp Releases·May 5, 2026
Models & Labsmodels

llama.cpp b9025 Release Expands Platform Support

The latest b9025 release of llama.cpp continues its trend of broadening platform compatibility, now supporting a wide array of systems including macOS, Linux, Windows, and Android. Notably, it introduces Vulkan support on Ubuntu and Windows, and adds ROCm 7.2 for Ubuntu, enhancing GPU performance options. This release doesn't introduce new models but focuses on making llama.cpp a versatile tool across different hardware configurations. By expanding its reach, llama.cpp is positioning itself as a go-to runtime for diverse computing environments, ensuring developers can leverage its capabilities regardless of their platform choice.

llama.cpp Releases·May 5, 2026

More in Models & Labs

Google unveils major AI advancements at Cloud Next '26© Google AI Blog
Models & Labsmodels

Google unveils major AI advancements at Cloud Next '26

Google's Cloud Next '26 event showcased significant advancements in AI, emphasizing the 'agentic era' with the launch of the Gemini Enterprise Agent Platform and eighth-generation TPUs. These innovations aim to enhance business operations and energy efficiency in data centers. The introduction of Gemma 4, an open model for advanced reasoning, and Deep Research Max, which automates high-level research tasks, marks a leap in AI capabilities. Additionally, Google Vids now offers free video generation, democratizing access to professional-quality content creation. These developments highlight Google's commitment to integrating AI into diverse sectors, from education to enterprise solutions.

Google AI Blog·May 4, 2026
Gemini API Introduces Webhooks for Long-Running Jobs© Google AI Blog
Models & Labsagents

Gemini API Introduces Webhooks for Long-Running Jobs

Google's Gemini API now supports event-driven Webhooks, significantly reducing friction and latency for long-running tasks. This new feature allows developers to receive real-time notifications when a job is completed, eliminating the need for continuous polling. The implementation adheres to the Standard Webhooks specification, ensuring secure and reliable communication with features like signed requests and automatic retries. This advancement makes it easier for developers to manage complex workflows, such as deep research or batch processing, with greater efficiency.

Google AI Blog·May 4, 2026
Models & Labsmodels

vLLM v0.20.2rc0 introduces shutdown() method

The latest release of vLLM, version 0.20.2rc0, brings a new shutdown() method, enhancing the control developers have over the lifecycle of their applications. This addition is a practical improvement for those managing resources and ensuring clean exits in their AI systems. While it may seem like a small update, it reflects a focus on robustness and reliability in AI infrastructure. Developers can now better manage their applications, reducing potential issues during shutdown processes.

vLLM Releases·May 4, 2026