16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Models & Labs
Models & Labs

llama.cpp adds EXAONE 4.5 implementations

llama.cpp Releases·June 2, 2026·high confidence

Why it matters

  • →Enhances compatibility and performance for EXAONE 4.5 models.
  • →Aligns with Qwen2.5-VL-style encode path for improved functionality.
  • →Ensures robust model loading and tensor registration.

The b9453 release of llama.cpp brings significant updates with the addition of EXAONE 4.5 implementations. This includes new vision markers and projector paths, aligning with the Qwen2.5-VL-style encode path. The update focuses on improving model loading and tensor registration for EXAONE 4.5, ensuring better performance and compatibility. While no new models are introduced, the release enhances existing functionalities, particularly for developers using EXAONE models. This update is crucial for maintaining robust performance across diverse systems.

Read original

More from llama.cpp Releases

Models & Labsmodels

llama.cpp b9455 Release Adds Quantized KV Cache

The latest b9455 release of llama.cpp introduces quantized KV cache support, a notable enhancement for efficiency in AI model inference. This update also addresses a partial view fix and removes an overly strict assert, improving the overall robustness of the software. While the release includes various platform builds, the focus remains on optimizing performance across different environments. The addition of quantized KV cache support is a step forward in making AI models more resource-efficient, particularly beneficial for developers working with limited computational resources.

llama.cpp Releases·Jun 2, 2026
Models & Labsmodels

llama.cpp b9457 release focuses on Vulkan improvements

The latest b9457 release of llama.cpp brings a notable improvement in Vulkan performance by reducing host memory lock contention, which can enhance efficiency in certain workloads. This update replaces unique_lock with lock_guard, aiming to streamline operations. While the release doesn't introduce new models or major features, it continues to refine the platform's compatibility across various systems, including macOS, Linux, and Windows. The focus remains on optimizing existing capabilities rather than expanding into new territories.

llama.cpp Releases·Jun 2, 2026
Models & Labsmodels

llama.cpp b9458 Release Enhances Vulkan Pipeline Compilation

The latest b9458 release of llama.cpp introduces a significant improvement in Vulkan pipeline compilation by optimizing mutex usage. By avoiding holding the device mutex during pipeline compilation, the update enhances performance and reduces potential bottlenecks in multi-threaded environments. This change is particularly relevant for developers working with Vulkan, as it streamlines the process of compiling pipelines on demand. While the update doesn't introduce new models or architectures, it quietly refines the efficiency of existing processes, making it a noteworthy enhancement for developers using llama.cpp.

llama.cpp Releases·Jun 2, 2026

More in Models & Labs

NVIDIA Jetson Advances Agentic AI in Robotics© NVIDIA Blog
Models & Labsmodels

NVIDIA Jetson Advances Agentic AI in Robotics

NVIDIA's latest JetPack 7.2 release marks a significant step in bringing agentic AI capabilities to the physical world, particularly in robotics and industrial automation. By integrating the NemoClaw framework, Jetson devices can now deploy AI agents that automate complex tasks, from defect detection to autonomous decision-making. This update enhances the Jetson platform with improved performance and memory optimization, making it more accessible for developers to create sophisticated AI systems. The move from server-based AI to edge deployment signifies a shift towards more autonomous and efficient operations across various industries.

NVIDIA Blog·Jun 2, 2026
JetBrains Releases Mellum2: Efficient 12B MoE Model© Hugging Face Blog
Models & Labsmodels

JetBrains Releases Mellum2: Efficient 12B MoE Model

JetBrains has unveiled Mellum2, a 12 billion parameter Mixture-of-Experts model designed for efficient text and code processing. By activating only 2.5 billion parameters per token, Mellum2 offers more than twice the inference speed of similar-sized models, making it ideal for high-throughput, latency-sensitive tasks. This model is particularly suited for software engineering applications, such as code generation and summarization, and can be deployed in private environments due to its open-source Apache 2.0 license. Mellum2 represents a shift towards specialized, efficient models that enhance the performance of larger AI systems without replacing them.

Hugging Face Blog·Jun 1, 2026
NVIDIA Unveils Cosmos 3 World Foundation Model© Sam Witteveen
Models & Labsmodels

NVIDIA Unveils Cosmos 3 World Foundation Model

NVIDIA's Cosmos 3 is a significant leap in AI model development, offering an omnimodal approach that can handle five different input and output modalities. This model is designed to integrate seamlessly into NVIDIA's ecosystem, enhancing capabilities in physical AI and open-world reasoning. By supporting diverse modalities, Cosmos 3 aims to provide a more comprehensive AI experience, potentially transforming how developers approach multi-modal AI tasks. This release positions NVIDIA at the forefront of AI innovation, offering new tools for developers to create more versatile and powerful AI applications.

Sam Witteveen·Jun 1, 2026