16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Models & Labs
Models & Labs

Llama.cpp b9387 Release Enhances AMD MFMA Performance

llama.cpp Releases·May 29, 2026·high confidence

Why it matters

  • →Optimizes performance for AMD MFMA hardware, enhancing efficiency.
  • →Provides significant throughput gains, crucial for high-performance computing.
  • →Maintains stability across non-AMD hardware, ensuring broad compatibility.

Llama.cpp's b9387 release focuses on optimizing performance for AMD MFMA hardware, particularly in quantized matrix multiplication tasks. The update adjusts batch threshold logic, resulting in throughput improvements of up to 76% on AMD's MI250X hardware. This release is tailored for users utilizing AMD GPUs, enhancing efficiency without introducing new models. The changes are byte-identical for non-AMD paths, ensuring stability across different hardware configurations.

Read original

More from llama.cpp Releases

Models & Labsmodels

llama.cpp b9388 release enhances Turing support

The latest b9388 release of llama.cpp introduces optimizations for Turing architecture, specifically adding MMVQ_PARAMETERS_TURING to improve JIT compilation for SM75 Turing devices. This update aims to prevent mismatches when compiling Turing device code on Ampere or newer architectures. While the release doesn't introduce new models or quantization methods, it continues to expand platform support, including updates for macOS, Linux, and Windows. The focus remains on refining compatibility and performance across diverse hardware configurations, making llama.cpp a more versatile tool for developers.

llama.cpp Releases·May 29, 2026
Open Sourcemodels

llama.cpp b9389 Release Expands Platform Support

The latest b9389 release of llama.cpp continues its trend of broadening platform compatibility, though with some notable exceptions. While macOS Apple Silicon users see KleidiAI support disabled, the release strengthens its Linux offerings with ROCm 7.2 and Vulkan support. Windows users benefit from updated CUDA DLLs, enhancing performance for CUDA 12 and 13. This release demonstrates llama.cpp's commitment to being a versatile inference runtime across diverse hardware, though some features remain disabled, indicating ongoing development challenges.

llama.cpp Releases·May 29, 2026
Open Sourcemodels

llama.cpp b9391 release expands platform support

The b9391 release of llama.cpp continues to broaden its platform support, making it more accessible to a diverse range of users. Notably, this update includes support for Ubuntu x64 with ROCm 7.2, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While some features like KleidiAI on macOS Apple Silicon and SYCL FP32 on Ubuntu are disabled, the release still marks a step forward in making llama.cpp a versatile tool across different operating systems. This update doesn't introduce new models but enhances the existing infrastructure, ensuring more users can leverage llama.cpp's capabilities.

llama.cpp Releases·May 29, 2026

More in Models & Labs

Models & Labsmodels

vLLM v0.20.2 Patch Release

The vLLM v0.20.2 release is a minor update focusing on bug fixes for DeepSeek V4, gpt-oss, and Qwen3-VL. This patch addresses specific issues such as the MTP=1 hang on DeepSeek V4 by re-enabling the persistent topk path and fixing a KV cache allocation error. For gpt-oss, the update ensures compatibility with MXFP4 under torch.compile, while Qwen3-VL sees the removal of an invalid boundary check. These fixes enhance the stability and performance of the models, ensuring smoother operations under various conditions.

vLLM Releases·May 29, 2026
AWS Launches OpenSearch Serverless for AI Agents© TechCrunch AI
Models & Labsmodels

AWS Launches OpenSearch Serverless for AI Agents

AWS is reshaping its cloud infrastructure to better accommodate AI agents with the launch of its next-generation OpenSearch Serverless. This new system is designed to handle the unpredictable traffic patterns of AI agents, scaling compute resources up and down as needed, which can significantly reduce costs for users. By decoupling compute from storage, AWS allows for instant scalability, ensuring that resources are only used when necessary. This shift reflects a broader industry trend as cloud providers adapt to the growing presence of machine-generated traffic, making AI agents more efficient and cost-effective to deploy.

TechCrunch AI·May 28, 2026
Anthropic releases Opus 4.8 with Dynamic Workflow© TechCrunch AI
Models & Labsmodels

Anthropic releases Opus 4.8 with Dynamic Workflow

Anthropic's release of Opus 4.8 marks a significant step forward in AI model development, particularly with its new Dynamic Workflows feature. This tool allows the model to manage complex tasks across numerous subagents, enhancing its capability to handle large-scale code migrations. The model also improves on handling uncertain data, proactively flagging potential issues, which sets it apart from competitors. While the Mythos model remains on hold due to cybersecurity concerns, Opus 4.8's advancements suggest Anthropic is keen to maintain its competitive edge in the rapidly evolving AI landscape.

TechCrunch AI·May 28, 2026