
Ollama has introduced an improved model scheduling system that aims to reduce crashes from out of memory issues and enhance GPU utilization and performance, particularly on multi-GPU setups.
Read originalThe latest version b8991 of llama.cpp has been released, featuring updates for various operating systems.
The latest update to llama-mmap improves compatibility with various platforms and model sizes. Key enhancements include support for 32-bit wasm and updates to gguf.cpp style.
