Models & Labs

Google DeepMind Launches Gemma 4 12B Multimodal Model

Google DeepMindJune 9, 2026high confidence

Why it matters

→Gemma 4 12B's encoder-free architecture reduces latency and memory usage.
→It enables advanced AI capabilities on consumer-grade hardware.
→The open-source release fosters innovation and accessibility in AI development.

Google DeepMind Launches Gemma 4 12B Multimodal Model — ©Google DeepMind

Google DeepMind has introduced Gemma 4 12B, a new multimodal AI model that integrates vision and audio inputs directly into its language model backbone, eliminating the need for separate encoders. This design reduces latency and memory usage, allowing the model to run on consumer laptops with 16GB of RAM. Released under an Apache 2.0 license, Gemma 4 12B is accessible to developers for building advanced AI applications. The model's performance approaches that of larger models, making it a versatile tool for multimodal and agentic tasks.

Read original

Google DeepMind Launches Gemma 4 12B Multimodal Model

Why it matters

More in Models & Labs

Llama.cpp adds GLM-5.2 speculative decoding support

Llama.cpp b10178 Release Adds Trace Logging

llama.cpp b10180 Release Enhances SYCL Performance