The b9334 release of llama.cpp introduces expanded platform support, including macOS Apple Silicon, Ubuntu with ROCm 7.2, and Windows with CUDA 12 and 13. This update enhances compatibility across a wide range of systems, allowing developers to utilize llama.cpp's AI inference capabilities more broadly. The addition of Vulkan and SYCL support further diversifies its usability for both CPU and GPU applications. While no new models are introduced, this release emphasizes making llama.cpp a more accessible and versatile tool for developers.
Read originalThe b9329 release of llama.cpp brings a notable performance enhancement with the integration of a fast Walsh-Hadamard transform for CUDA, which is set to improve computational efficiency. This update also includes optimizations such as unrolling and changes from size_t to int, aimed at boosting processing speed. The release is compatible with platforms like macOS, Linux, Windows, and openEuler, ensuring developers can leverage these improvements across different environments. While there are no new models introduced, the emphasis on performance optimization makes this update significant for those working with CUDA and other supported systems.
The b9330 release of llama.cpp resolves a key issue by correctly tagging the ffn_latent operation as MUL_MAT, aligning it with the backend's operational expectations. This correction ensures that weights and their matrix multiplications remain on the GPU, avoiding unnecessary CPU fallback and graph splitting. As a result, performance on the Nemotron 3 Super 120B Q5_K_M model has significantly improved, with throughput increasing from 64.9 to 103.22 tokens per second. This update reflects llama.cpp's dedication to enhancing AI model performance across different computing environments, including macOS with KleidiAI and Ubuntu with ROCm 7.2. By maintaining efficient GPU processing, llama.cpp continues to optimize AI model execution, ensuring robust performance on platforms like CUDA 12 and CUDA 13.
The b9331 release of llama.cpp brings a strategic overhaul to its continuous integration workflows, focusing on efficiency by isolating tasks into separate workflows. This update includes the extraction of Android and HIP tasks, alongside the relocation of WebGPU and RPC tasks into distinct workflows. Additionally, the release halts SYCL f16 builds and optimizes pull request jobs by aligning backend paths. While there are no new model architectures introduced, this release aims to streamline development processes and enhance build management across diverse environments.
© NVIDIA BlogNVIDIA's new Vera CPU is making waves with its impressive performance in AI-centric workloads, challenging the dominance of Intel and AMD. Featuring 88 custom Olympus cores and a remarkable 1.2TB/s memory bandwidth, Vera is designed to handle the demanding tasks of modern AI factories efficiently. Initial benchmarks by Phoronix highlight its superior memory performance and power efficiency, particularly in comparison to traditional x86 CPUs. This positions Vera as a formidable competitor in the CPU market, offering a significant generational leap over NVIDIA's previous Grace CPU. As Vera becomes available through partners, it promises to redefine performance standards in AI infrastructure.
© GitHub ChangelogGitHub has introduced a new feature for enterprise users that allows for more granular control over which Copilot models are available to specific organizations. This update, now in public preview, enables enterprise owners to set targeted model rules, moving beyond a single enterprise-wide setting. The refreshed interface simplifies managing default model availability, allowing users to enable or make models optional for different organizations. This development provides businesses with enhanced flexibility and control over AI model deployment within their GitHub environments.
© The AI Daily BriefOpenAI has made a significant advancement in mathematical capabilities within its AI models.