
DeepSeek V4 Pro has been launched, boasting an impressive 1.6 trillion parameters. This model is part of China's aggressive push in the AI space, backed by significant government support. The rapid development of such models poses a strategic risk for US businesses, which may find themselves increasingly reliant on Chinese AI technologies.
Read originalThe latest b9060 release of llama.cpp introduces several new SYCL operations, including FILL, CUMSUM, and DIAG, which expand the library's computational capabilities. This update also addresses a critical issue that caused aborts during test-backend-ops, ensuring more stable performance. With the addition of scope_dbg_print to both new and existing SYCL operations, developers gain enhanced debugging tools. This release continues to broaden llama.cpp's platform support, making it a more versatile tool for developers working across different environments.
The b9066 release of llama.cpp brings notable improvements for CUDA users by integrating cublasSgemmStridedBatched, which optimizes batch operations' inner loops. This enhancement is designed to boost performance for developers leveraging CUDA technology. The update also extends compatibility to include macOS Apple Silicon, Ubuntu with ROCm, and Windows with CUDA 12 and 13, ensuring developers can work seamlessly across different systems. While no new models are introduced, the release strengthens llama.cpp's role as a flexible tool for developers working with diverse hardware setups.