16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Models & Labs
Models & Labs

llama.cpp b9788 release enhances dual-GPU support

llama.cpp Releases·June 26, 2026·high confidence

Why it matters

  • →Enhances dual-GPU performance with SYCL support, crucial for high-performance computing.
  • →Mirrors CUDA's NCCL allreduce pattern, making it more competitive in multi-GPU environments.
  • →Significant speed improvements in model performance, boosting efficiency for developers.

Llama.cpp's b9788 release brings enhanced support for dual-GPU configurations using SYCL, focusing on tensor parallelism. The update implements a degenerate ring all-reduce mechanism, optimizing both small and large tensor operations and mirroring CUDA's NCCL allreduce pattern. Performance tests show significant speed improvements for models like Llama-3.3-70B and Qwen3-Coder-Next-80B-A3B. This positions llama.cpp as a more efficient choice for multi-GPU setups, maintaining its competitive edge without requiring new dependencies.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9781 Release Expands Platform Support

The latest b9781 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While KleidiAI support for macOS Apple Silicon is disabled, the release still covers a wide array of platforms, including Windows and openEuler. This update reinforces llama.cpp's position as a versatile inference runtime, though it remains focused on platform expansion rather than introducing new model architectures.

llama.cpp Releases·Jun 26, 2026
Open Sourcemodels

llama.cpp b9782 Release Expands Platform Support

The latest b9782 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While KleidiAI support for Apple Silicon remains disabled, the release still covers a wide array of platforms, from Windows to openEuler. This update solidifies llama.cpp's position as a versatile inference runtime, though it doesn't introduce groundbreaking changes.

llama.cpp Releases·Jun 26, 2026
Models & Labsmodels

Llama.cpp b9784 Release Enhances Hexagon Performance

The latest b9784 release of llama.cpp brings significant optimizations to Hexagon's matrix multiplication capabilities. By reworking the MUL_MAT and MUL_MAT_ID operations, the update introduces a 32x32 tiled weight repack and improved kernel parameters, enhancing performance and efficiency. These changes aim to optimize register usage and streamline activation processing, particularly benefiting users leveraging Hexagon's architecture. This release doesn't introduce new models but focuses on refining existing processes, making llama.cpp more robust for developers working with diverse hardware configurations.

llama.cpp Releases·Jun 26, 2026

More in Models & Labs

OpenAI Develops Custom Chip 'Jalapeño'© The AI Daily Brief
Models & Labsmodels

OpenAI Develops Custom Chip 'Jalapeño'

OpenAI has announced the development of its first custom chip, named 'Jalapeño'.

The AI Daily Brief·Jun 25, 2026
GPT-5.5 Instant Now Available for Free Users© The AI Daily Brief
Models & Labsmodels

GPT-5.5 Instant Now Available for Free Users

OpenAI has updated GPT-5.5 Instant, making it accessible to users on the free tier.

The AI Daily Brief·Jun 25, 2026
Unconventional AI Aims to Slash AI Power Use by 1,000x© TechCrunch AI
Models & Labsmodels

Unconventional AI Aims to Slash AI Power Use by 1,000x

Unconventional AI, led by former Databricks AI chief Naveen Rao, is pioneering a new computing architecture that could drastically reduce the power consumption of AI inference by up to 1,000 times. Their first model, Un-0, demonstrates the potential of an oscillator-based architecture to match the performance of state-of-the-art diffusion models in image generation. While currently running on a software simulation, the company plans to release chip schematics soon, aiming to build a complete inference stack. This innovation could address the looming energy constraints in AI scaling, offering a sustainable path forward.

TechCrunch AI·Jun 25, 2026