
Microsoft Research has developed mimalloc, a scalable memory allocator designed to replace traditional malloc and free functions. With a compact codebase of around 12,000 lines, mimalloc efficiently handles memory allocation across multiple threads, reducing synchronization needs. It has been adopted in major services like Bing and integrated into platforms such as Unreal Engine. This allocator is particularly effective for applications with large memory demands, offering improved performance and reduced contention. Mimalloc's open-source nature and widespread use underscore its impact on modern memory management.
Read originalThe b9129 release of llama.cpp introduces an adaptive fallback feature for the ggml-zendnn backend, which optimizes performance by switching to the CPU for small batch sizes. This feature is enabled by default, but developers can control it using a new runtime environment variable, allowing them to revert to the original fallback logic if desired. The update supports platforms like macOS with KleidiAI, Windows with CUDA 12 and 13, and Ubuntu with ROCm 7.2, ensuring efficient processing across different systems. This release highlights llama.cpp's focus on enhancing performance and flexibility for developers working with various hardware configurations.
The latest b9134 release of llama.cpp continues its trend of broadening platform compatibility, making it a versatile tool for developers across various systems. This update includes support for macOS Apple Silicon with KleidiAI enabled, as well as expanded Vulkan and ROCm 7.2 support on Ubuntu. Windows users benefit from updated CUDA 12 and 13 DLLs, enhancing performance for GPU tasks. While no new models are introduced, the release solidifies llama.cpp's position as a flexible inference runtime across diverse hardware configurations.