Models & Labs

NVIDIA Boosts Google DeepMind's DiffusionGemma Speed

NVIDIA BlogJune 10, 2026high confidence

Why it matters

→DiffusionGemma's parallel text generation offers a new approach to AI workloads.
→NVIDIA's optimization allows for faster, local AI applications without cloud dependency.
→The model's open access and compatibility with popular platforms enhance its practical use.

NVIDIA Boosts Google DeepMind's DiffusionGemma Speed — ©NVIDIA Blog

NVIDIA has enhanced Google DeepMind's DiffusionGemma model to achieve faster text generation on NVIDIA GPUs. DiffusionGemma, built on the Gemma 4 architecture, generates text in parallel blocks rather than sequentially, allowing for up to 4x faster performance. This model is optimized for NVIDIA's hardware, including GeForce RTX and DGX systems, enabling local, low-latency AI applications. The open model is available under an Apache 2.0 license and can be accessed through platforms like Hugging Face Transformers.

Read original

More from NVIDIA Blog

Models & Labsmodels

NVIDIA Jetson: Compact AI Power for Developers

NVIDIA's Jetson platform is making waves by offering powerful AI capabilities in a compact form factor, ideal for developers working on edge AI and robotics. The Jetson Orin Nano Super, for instance, delivers 67 trillion operations per second, enabling first-time builders to explore computer vision and AI agent development. This platform is not just about portability; it provides a practical path for students, researchers, and developers to create and deploy AI solutions without relying on cloud services. With NVIDIA Jetson, the potential for innovation in classrooms and labs worldwide is significantly expanded.

NVIDIA BlogJul 28, 2026

Market & Regulationresearch

Open Secure AI Alliance Formed for AI Safety

The Open Secure AI Alliance represents a pivotal move towards enhancing AI safety and security through open source collaboration. With industry giants like NVIDIA, Microsoft, and IBM participating, the alliance is set to develop open technologies and tools that enable defenders to effectively inspect, adapt, and deploy AI systems. This initiative highlights the critical role of transparency and community-driven defense in cybersecurity, challenging the assumption that closed systems are inherently safer. By fostering an open defense stack, the alliance aims to democratize AI safety, ensuring that critical industries can build robust security systems without being dependent on a few closed providers.

NVIDIA BlogJul 27, 2026

More in Models & Labs

Models & Labsmodels

Llama.cpp adds GLM-5.2 speculative decoding support

Llama.cpp's latest update introduces speculative decoding support for GLM-5.2, enhancing its capabilities with NextN/MTP features. This addition allows for more efficient tensor loading and context management, particularly benefiting models using the GLM_DSA architecture. The update also includes options for exporting models with or without the MTP feature, providing flexibility for developers. This release marks a step forward in optimizing model performance and adaptability, especially for those leveraging the GLM-5.2 framework.

llama.cpp ReleasesJul 30, 2026

Models & Labsmodels

Llama.cpp b10178 Release Adds Trace Logging

The b10178 release of llama.cpp enhances its server capabilities by adding trace logging for slot similarity checking, offering developers detailed insights into prompt cache slot selection processes. This update includes specifics on skip reasons and similarity calculations, which can aid in performance optimization. While no new model architectures are introduced, the release continues to support a wide array of platforms, such as macOS with KleidiAI, Ubuntu with ROCm 7.2, and Windows with CUDA 12 and 13. This makes llama.cpp a more versatile tool for developers working on different systems, reinforcing its position as a comprehensive inference runtime.

llama.cpp ReleasesJul 30, 2026

Models & Labsmodels

llama.cpp b10180 Release Enhances SYCL Performance

The b10180 release of llama.cpp brings notable improvements to SYCL performance, focusing on unary elementwise operations. By introducing a contiguous fast path and employing 32-bit index math, the update aims to boost computational efficiency. The integration of fastdiv for elementwise index math further enhances processing speed. Although there are no new models in this release, llama.cpp continues to evolve as a flexible inference runtime, now more efficient on systems like macOS, Linux, and Windows. Developers working with SYCL can expect smoother and faster operations, reinforcing llama.cpp's adaptability across different computing environments.

llama.cpp ReleasesJul 30, 2026