The v0.19.1 patch release has been announced, building on the previous version v0.19.0. This update includes an upgrade to Transformers v5.5.3 and addresses several bugs related to the Gemma4 tool, such as fixing invalid JSON in streaming tool calls and resolving issues with tool call corruption. Additionally, it introduces support for quantized MoE and Eagle3. These improvements aim to enhance the functionality and reliability of the Gemma4 system.
Read originalThe latest version b8991 of llama.cpp has been released, featuring updates for various operating systems.
The latest update to llama-mmap improves compatibility with various platforms and model sizes. Key enhancements include support for 32-bit wasm and updates to gguf.cpp style.

The v0.19.0rc0 release introduces a feature for CPU key-value cache offloading, enhancing performance. This update was signed off by Yifan Qiao.