16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Models & Labs
Models & Labs

Llama.cpp b9626 Release Adds Cohere2-MoE Support

llama.cpp Releases·June 14, 2026·high confidence

Why it matters

  • →The update enhances support for the cohere2-MoE model, broadening its applicability.
  • →Technical improvements streamline performance and reduce redundancy.
  • →Adding cohere2moe to the supported list increases the framework's versatility.

The b9626 release of llama.cpp has been announced, featuring architectural support for the cohere2-MoE model. This update includes several technical improvements, such as the removal of redundant checks and enhancements in tensor handling. The release also adds cohere2moe to the Llama Model Saver supported list, expanding its utility for developers. These changes aim to improve the performance and flexibility of the llama.cpp framework, making it a more robust tool for AI development.

Read original

More from llama.cpp Releases

Open Sourcecoding

llama.cpp b9622 Release Enhances Vulkan Support

The b9622 release of llama.cpp significantly boosts Vulkan capabilities, particularly for non-contiguous unary and glu operations. By refining index calculations with fastdiv and merging unary operations into a single file, the update enhances both performance and code efficiency. It also tackles a compiler bug and resolves earlier conflicts, ensuring smoother functionality across a broad spectrum of hardware setups. While this update doesn't introduce revolutionary features, it strengthens llama.cpp's role as a flexible tool for developers working with diverse hardware, including macOS, Linux, Windows, and openEuler.

llama.cpp Releases·Jun 14, 2026
Open Sourcecoding

llama.cpp b9624 Release Expands Platform Support

The b9624 release of llama.cpp enhances its utility by introducing build-time gzip compression, which can optimize performance through reduced file sizes. This update continues to cater to developers working on various systems, including macOS, Linux, Windows, and openEuler, with specific builds for architectures like arm64 and x64. The inclusion of ROCm 7.2 for Ubuntu x64 and CUDA 12 and 13 for Windows x64 highlights its adaptability to different hardware environments. While there are no new model architectures, the release strengthens llama.cpp's role as a flexible tool for developers needing compatibility across diverse setups.

llama.cpp Releases·Jun 14, 2026
Open Sourcemodels

llama.cpp b9625 Release Expands Platform Support

The latest b9625 release of llama.cpp continues its trend of broadening platform compatibility, though without any groundbreaking new features. Notably, it includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. The release also maintains a wide array of builds across macOS, Linux, Windows, and openEuler, though some configurations like KleidiAI on Apple Silicon remain disabled. While this update doesn't introduce new models or quantization methods, it solidifies llama.cpp's role as a versatile inference runtime across diverse systems.

llama.cpp Releases·Jun 14, 2026

More in Models & Labs

Models & Labsmodels

vLLM v0.23.0 Release Enhances Model Support

The vLLM v0.23.0 release marks a significant step forward with enhancements across various components. DeepSeek-V4 has been optimized further, decoupling its metadata from previous versions and adding new attention kernels. Model Runner V2 now supports more dense models by default, improving performance for Llama and Mistral. The Rust frontend has matured with new endpoints and tool parsers, while compatibility with Transformers v5 ensures broader model support. These updates collectively enhance the robustness and versatility of vLLM, making it a more powerful tool for developers working with large language models.

vLLM Releases·Jun 14, 2026
NVIDIA Blackwell Tops Agentic AI Benchmark© NVIDIA Blog
Models & Labsmodels

NVIDIA Blackwell Tops Agentic AI Benchmark

NVIDIA's Blackwell Ultra NVL72 platform has emerged as a leader in the first agentic AI benchmark, AgentPerf, developed by Artificial Analysis. This benchmark is designed to measure the performance of AI systems handling complex, multi-step tasks, unlike traditional conversational AI benchmarks. The Blackwell platform outperformed others by running 20 times more agents per megawatt than its predecessor, NVIDIA Hopper. This advancement is significant for enterprises deploying AI agents at scale, as it directly impacts infrastructure efficiency and cost-effectiveness.

NVIDIA Blog·Jun 12, 2026
Google DiffusionGemma 26B Unveiled© Lev Selector
Models & Labsmodels

Google DiffusionGemma 26B Unveiled

Google has introduced DiffusionGemma 26B, a new AI model with advanced capabilities.

Lev Selector·Jun 12, 2026