
Thinking Machines Lab, under the leadership of Mira Murati, has launched a research preview of its new interaction models. These AI systems are designed for real-time collaboration, allowing users to interact seamlessly across voice, video, and text. The models process inputs in 200ms chunks, maintaining a continuous dialogue without the pauses typical of other systems. This innovation highlights a shift towards more integrated human-AI collaboration, challenging the current trend of autonomous AI agents.
Read originalThe latest b9116 release of llama.cpp introduces MiMo v2.5, enhancing vision support with fused qkv for improved performance. This update addresses previous issues like f16 vision overflow and includes various cleanups for better code maintenance. With expanded platform support, including macOS, Linux, and Windows, this release broadens accessibility for developers working on diverse systems. The focus on vision capabilities marks a significant step in making llama.cpp a more versatile tool for AI developers, particularly those interested in integrating vision functionalities.
The b9119 release of llama.cpp focuses on fixing a performance regression for Intel GPU BF16 workloads on Windows, specifically targeting Xe2 and newer models. This update ensures that users on these platforms experience improved performance, particularly when using Vulkan. The release also includes a refactor to optimize the use of l_warptile only when coopamt is available for BF16, enhancing efficiency. While the update doesn't introduce new models or groundbreaking features, it solidifies llama.cpp's commitment to maintaining and improving performance across diverse hardware configurations.