Models & Labs

Granite Embedding R2: New Multilingual Models Released

Hugging Face BlogMay 14, 2026high confidence

Why it matters

→The 97M model sets a new benchmark for sub-100M multilingual embeddings.
→The models support over 200 languages, enhancing cross-lingual capabilities.
→They offer a 32K-token context window, significantly improving long-sequence processing.

Granite Embedding R2: New Multilingual Models Released — ©Hugging Face Blog

Hugging Face has released two new multilingual embedding models under the Apache 2.0 license. The Granite Embedding Multilingual R2 models include a 97M-parameter compact model and a 311M full-size model, both supporting over 200 languages. The compact model achieves the highest retrieval score for any open multilingual model under 100M parameters, while the full-size model ranks second among models under 500M parameters. These models are designed for broad language coverage and high retrieval quality, making them suitable for diverse multilingual and code retrieval tasks.

Read original

More from Hugging Face Blog

Models & Labsmodels

OlmoEarth Platform Enables Large-Scale Geospatial Inference

The OlmoEarth Platform is a significant advancement in geospatial inference, designed to handle the massive scale of Earth observation data. By processing terabytes of satellite imagery efficiently, it enables organizations to generate continent-scale maps in a day, at minimal cost. This platform addresses the challenges of data acquisition, processing, and inference, making it accessible even to organizations without extensive engineering resources. With its ability to run large-scale inference jobs using thousands of CPUs and GPUs, OlmoEarth is poised to transform how environmental data is utilized for applications like wildfire risk mapping and deforestation monitoring.

Hugging Face BlogJul 28, 2026

Models & Labsmodels

LFM2.5-Encoders Boost Long-Context Inference on CPU

Hugging Face's LFM2.5-Encoders represent a leap forward in handling long-context inference, particularly on CPU. These models outperform larger counterparts like ModernBERT-base in speed, efficiently managing up to 8,192-token contexts. This makes them particularly suitable for high-volume tasks such as classification and routing, where speed and cost-effectiveness are crucial. The models are open-source and available for immediate use, allowing developers to fine-tune them for specific applications. This release signals a move towards more efficient, CPU-friendly NLP solutions that maintain high performance without the need for extensive hardware.

Hugging Face BlogJul 28, 2026

Models & Labsmodels

NVIDIA Unveils Real-Time Surgical Simulator

NVIDIA's Cosmos-H-Dreams marks a significant leap in surgical robotics simulation by enabling real-time, action-conditioned generative environments. Building on the Cosmos-H-Surgical-Simulator, this new model operates on a single NVIDIA RTX PRO 6000 GPU, offering interactive simulations that can be controlled in a closed loop. By integrating with platforms like the Versius surgeon controller, Cosmos-H-Dreams demonstrates its versatility and potential for real-time operation. This development not only enhances the speed and efficiency of surgical simulations but also opens new possibilities for policy development and surgical training without the need for physical robots.

Hugging Face BlogJul 27, 2026

More in Models & Labs

Models & Labsmodels

Llama.cpp adds GLM-5.2 speculative decoding support

Llama.cpp's latest update introduces speculative decoding support for GLM-5.2, enhancing its capabilities with NextN/MTP features. This addition allows for more efficient tensor loading and context management, particularly benefiting models using the GLM_DSA architecture. The update also includes options for exporting models with or without the MTP feature, providing flexibility for developers. This release marks a step forward in optimizing model performance and adaptability, especially for those leveraging the GLM-5.2 framework.

llama.cpp ReleasesJul 30, 2026

Models & Labsmodels

Llama.cpp b10178 Release Adds Trace Logging

The b10178 release of llama.cpp enhances its server capabilities by adding trace logging for slot similarity checking, offering developers detailed insights into prompt cache slot selection processes. This update includes specifics on skip reasons and similarity calculations, which can aid in performance optimization. While no new model architectures are introduced, the release continues to support a wide array of platforms, such as macOS with KleidiAI, Ubuntu with ROCm 7.2, and Windows with CUDA 12 and 13. This makes llama.cpp a more versatile tool for developers working on different systems, reinforcing its position as a comprehensive inference runtime.

llama.cpp ReleasesJul 30, 2026

Models & Labsmodels

llama.cpp b10180 Release Enhances SYCL Performance

The b10180 release of llama.cpp brings notable improvements to SYCL performance, focusing on unary elementwise operations. By introducing a contiguous fast path and employing 32-bit index math, the update aims to boost computational efficiency. The integration of fastdiv for elementwise index math further enhances processing speed. Although there are no new models in this release, llama.cpp continues to evolve as a flexible inference runtime, now more efficient on systems like macOS, Linux, and Windows. Developers working with SYCL can expect smoother and faster operations, reinforcing llama.cpp's adaptability across different computing environments.

llama.cpp ReleasesJul 30, 2026