16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Models & Labs
Models & Labs

Llama.cpp b9329 Release Enhances CUDA Performance

llama.cpp Releases·May 27, 2026·high confidence

Why it matters

  • →The fast Walsh-Hadamard transform enhances CUDA performance, crucial for intensive computations.
  • →Broad platform support ensures accessibility for diverse development environments.
  • →Performance optimizations can lead to faster processing times and improved efficiency.

Llama.cpp has released its b9329 update, featuring a fast Walsh-Hadamard transform for CUDA, which is expected to enhance performance significantly. The update also includes optimizations like unrolling and data type adjustments, aimed at improving computational efficiency. This release supports multiple platforms, including macOS, Linux, Windows, and openEuler, making it accessible to a wide range of users. While no new models are introduced, the focus on performance improvements is a key highlight for developers utilizing CUDA.

Read original

More from llama.cpp Releases

Models & Labsmodels

llama.cpp b9330 release improves model performance

The b9330 release of llama.cpp resolves a key issue by correctly tagging the ffn_latent operation as MUL_MAT, aligning it with the backend's operational expectations. This correction ensures that weights and their matrix multiplications remain on the GPU, avoiding unnecessary CPU fallback and graph splitting. As a result, performance on the Nemotron 3 Super 120B Q5_K_M model has significantly improved, with throughput increasing from 64.9 to 103.22 tokens per second. This update reflects llama.cpp's dedication to enhancing AI model performance across different computing environments, including macOS with KleidiAI and Ubuntu with ROCm 7.2. By maintaining efficient GPU processing, llama.cpp continues to optimize AI model execution, ensuring robust performance on platforms like CUDA 12 and CUDA 13.

llama.cpp Releases·May 27, 2026
Open Sourcemodels

llama.cpp b9331 Release Enhances CI Workflows

The b9331 release of llama.cpp brings a strategic overhaul to its continuous integration workflows, focusing on efficiency by isolating tasks into separate workflows. This update includes the extraction of Android and HIP tasks, alongside the relocation of WebGPU and RPC tasks into distinct workflows. Additionally, the release halts SYCL f16 builds and optimizes pull request jobs by aligning backend paths. While there are no new model architectures introduced, this release aims to streamline development processes and enhance build management across diverse environments.

llama.cpp Releases·May 27, 2026
Open Sourcemodels

llama.cpp b9333 release expands platform support

The b9333 release of llama.cpp marks a significant expansion in its platform reach, enhancing its utility across various systems. With this update, macOS Apple Silicon users can now leverage KleidiAI, while Ubuntu users benefit from Vulkan and ROCm 7.2 enhancements. Windows compatibility is also improved with the inclusion of CUDA 12 and 13 DLLs, and openEuler architectures are now part of the supported lineup. Although there are no new model architectures in this release, llama.cpp is becoming a more versatile inference runtime, catering to a broader range of hardware configurations.

llama.cpp Releases·May 27, 2026

More in Models & Labs

NVIDIA Vera CPU Challenges Intel and AMD© NVIDIA Blog
Models & Labsmodels

NVIDIA Vera CPU Challenges Intel and AMD

NVIDIA's new Vera CPU is making waves with its impressive performance in AI-centric workloads, challenging the dominance of Intel and AMD. Featuring 88 custom Olympus cores and a remarkable 1.2TB/s memory bandwidth, Vera is designed to handle the demanding tasks of modern AI factories efficiently. Initial benchmarks by Phoronix highlight its superior memory performance and power efficiency, particularly in comparison to traditional x86 CPUs. This positions Vera as a formidable competitor in the CPU market, offering a significant generational leap over NVIDIA's previous Grace CPU. As Vera becomes available through partners, it promises to redefine performance standards in AI infrastructure.

NVIDIA Blog·May 26, 2026
GitHub Introduces Targeted Copilot Model Rules© GitHub Changelog
Models & Labsmodels

GitHub Introduces Targeted Copilot Model Rules

GitHub has introduced a new feature for enterprise users that allows for more granular control over which Copilot models are available to specific organizations. This update, now in public preview, enables enterprise owners to set targeted model rules, moving beyond a single enterprise-wide setting. The refreshed interface simplifies managing default model availability, allowing users to enable or make models optional for different organizations. This development provides businesses with enhanced flexibility and control over AI model deployment within their GitHub environments.

GitHub Changelog·May 26, 2026
OpenAI Achieves Math Breakthrough© The AI Daily Brief
Models & Labsmodels

OpenAI Achieves Math Breakthrough

OpenAI has made a significant advancement in mathematical capabilities within its AI models.

The AI Daily Brief·May 24, 2026