The v0.19.0rc0 release of vLLM has been announced, featuring a new capability for CPU key-value cache offloading. This update aims to enhance performance and efficiency in managing cache operations. The release was signed off by Yifan Qiao, indicating contributions from multiple sources. This development is part of ongoing efforts to improve the vLLM framework's functionality.
Read originalThe latest version b8991 of llama.cpp has been released, featuring updates for various operating systems.
The latest update to llama-mmap improves compatibility with various platforms and model sizes. Key enhancements include support for 32-bit wasm and updates to gguf.cpp style.

The v0.19.0rc1 release includes a bug fix that restricts TRTLLM attention to SM100, addressing issues with GB300 (SM103).