The v0.19.0rc1 release of vLLM has been announced, which includes a bug fix that restricts TRTLLM attention to SM100, addressing issues with GB300 (SM103). This update aims to improve the functionality and performance of the model. Users are encouraged to update to this release to benefit from the fixes implemented.
Read originalThe latest version b8991 of llama.cpp has been released, featuring updates for various operating systems.
The latest update to llama-mmap improves compatibility with various platforms and model sizes. Key enhancements include support for 32-bit wasm and updates to gguf.cpp style.

The v0.19.0rc0 release introduces a feature for CPU key-value cache offloading, enhancing performance. This update was signed off by Yifan Qiao.