The b9745 release of llama.cpp focuses on enhancing multi-threaded processing capabilities with the introduction of Step3.5/3.7 flash MTP3 support. This update brings new APIs and improvements in multi-head processing, aiming to optimize performance. The release also includes platform-specific builds for macOS, Linux, Windows, and openEuler, expanding compatibility across various systems. These changes make llama.cpp more versatile for developers, although no new models are introduced in this update.
Read originalThe b9747 release of llama.cpp brings a notable improvement with real-time model load progress tracking, enhancing user interaction by offering immediate insights during loading. This update includes server-side improvements such as the addition of a mutex for notify_to_router, which ensures more reliable operations. While there are no new model architectures introduced, the release broadens its reach by supporting platforms like macOS, Linux, and Windows. This makes llama.cpp a more flexible tool for developers working in different environments, although some features like KleidiAI on Apple Silicon are not yet active. The inclusion of ROCm 7.2 and CUDA 12 and 13 DLLs further solidifies its utility across diverse hardware setups.
The latest b9748 release of llama.cpp continues its trend of broadening platform compatibility, notably adding support for ROCm 7.2 on Ubuntu x64. This update ensures that AMD GPU users can leverage llama.cpp more effectively, narrowing the gap with NVIDIA's CUDA. The release also includes Vulkan support on several operating systems, enhancing performance options for developers. While there are no groundbreaking new features, this update solidifies llama.cpp's position as a versatile inference runtime across diverse hardware configurations.
The latest b9750 release of llama.cpp continues its trend of broadening platform compatibility, notably with the inclusion of ROCm 7.2 for Ubuntu x64, which enhances support for AMD GPUs. This update also refines the codebase by implementing a call statement and simplifying certain functions, which could improve performance and maintainability. While KleidiAI support for macOS Apple Silicon is disabled, the release still offers a wide array of builds across macOS, Linux, Windows, and openEuler. This iteration doesn't introduce new models but strengthens llama.cpp's position as a versatile inference runtime across diverse hardware configurations.
© NVIDIA BlogNVIDIA has introduced a groundbreaking AI server infrastructure that operates entirely on liquid cooling, setting a new standard for energy efficiency in data centers. By allowing cooling liquids to reach temperatures as high as 45 degrees Celsius, these servers significantly reduce energy and water consumption, addressing one of the largest operational costs in data centers. This innovation not only cuts down on the need for mechanical chillers and fans but also opens up possibilities for waste heat recovery. The Rubin generation servers promise to transform data center operations, especially in climates where traditional cooling methods are less efficient.
© Lev SelectorClaude Fable 5 was released and then withdrawn as Anthropic negotiates access with the administration.
© Lev SelectorOpenRouter Fusion uses model ensembles to reduce hallucinations and improve accuracy while lowering costs.