The b9827 release of llama.cpp focuses on improving CUDA performance by implementing a cudaMemcpy2DAsync fast path for strided tensor copies. This enhancement is particularly useful for operations where tensors are not fully contiguous, optimizing the process by avoiding slower element-wise scalar copy kernels. The update addresses performance issues in specific scenarios, such as the GDN recurrent snapshot update. However, the new tests for this feature are unsupported in OpenVINO, indicating areas for future development.
Read originalThe latest b9817 release of llama.cpp brings significant updates to its OpenVINO backend, including an upgrade to OV 2026.2.1 and the introduction of self-contained release packages. These changes streamline the deployment process and improve operator handling, making it easier for developers to integrate and utilize OpenVINO in their projects. Additionally, the update removes hardcoded compute operation types, enhancing flexibility and adaptability. This release marks a step forward in making llama.cpp a more versatile and developer-friendly platform, particularly for those leveraging OpenVINO's capabilities.
The b9820 release of llama.cpp brings notable improvements to CUDA performance by cutting down on unnecessary synchronizations, which can streamline token processing. This update introduces asynchronous copy capabilities between CPU and CUDA, facilitating smoother data transfers and potentially speeding up computations. Backend detection has been refined to avoid linking conflicts, and synchronization adjustments have been made more general, allowing other backends like Vulkan to benefit. These enhancements aim to optimize performance across different hardware setups, making llama.cpp a more adaptable tool for developers working with diverse configurations.
In a strategic move, Asian AI startups are stepping into the spotlight as the U.S. export ban on Anthropic's Mythos and Fable models continues. Chinese cybersecurity firm 360 has introduced Tulongfeng, an AI tool aimed at software vulnerability detection, while Tokyo-based Sakana AI has launched Fugu, a model designed for agent orchestration and optimized for Japanese language and culture. These launches highlight a growing trend of regional AI development, offering alternatives to U.S. models and addressing local needs. As the export ban persists, these startups are seizing the opportunity to fill the void left by restricted access to U.S. AI technologies.