Models & Labs

llama.cpp b9103 Release Expands Platform Support

llama.cpp ReleasesMay 12, 2026high confidence

Why it matters

→Expands platform support, making AI deployment more versatile.
→Enhances performance on Apple Silicon with KleidiAI.
→Provides more options for AMD GPU users with ROCm 7.2.

The latest b9103 release of llama.cpp expands its platform support, catering to a wide range of operating systems and hardware configurations. Notably, it includes KleidiAI support for Apple Silicon, enhancing performance on M-series Macs. The update also adds ROCm 7.2 support for Ubuntu x64, providing more options for AMD GPU users. While no new models are introduced, this release strengthens llama.cpp's role as a versatile AI runtime across various platforms.

Read original

More from llama.cpp Releases

Open Sourcemodels

llama.cpp b9105 Release Enhances CUDA Integration

The b9105 release of llama.cpp brings a notable improvement by directly incorporating cuda/iterator, which enhances the reliability of CUDA operations. This update moves away from the previous reliance on a transient import from cub/cub.cuh, ensuring more stable performance for developers using NVIDIA GPUs. The release continues to support a broad array of platforms, including macOS with KleidiAI enabled, Linux with ROCm 7.2, and Windows with CUDA 12 and 13. While there are no new model architectures introduced, this update reinforces llama.cpp's role as a dependable tool for AI developers working across different hardware environments.

llama.cpp ReleasesMay 12, 2026

Models & Labsmodels

Llama.cpp b9109 Release Enhances Drafting Support

The b9109 release of llama.cpp brings notable advancements in parallel drafting, enhancing the efficiency of model processing. By refining speculative contexts and supporting multiple spec types, the update optimizes the acceptance of tokens and the drafting process. This release ensures compatibility with macOS, Linux, and Windows, including specific support for Apple Silicon with KleidiAI, ROCm 7.2, and CUDA 12 and 13. While it doesn't introduce new model architectures, the focus on refining existing capabilities makes llama.cpp a more robust tool for developers. The improvements in speculative processing and platform-specific enhancements make it a valuable update for those working with AI models.

llama.cpp ReleasesMay 12, 2026

Models & Labsmodels

llama.cpp b9112 Release Fixes CUDA Limitations

The b9112 release of llama.cpp tackles a crucial issue with CUDA's im2col operations, which previously struggled with output widths exceeding 65535. By adjusting grid dimensions and incorporating an in-kernel loop, the update allows models like SEANet to process longer audio sequences without errors. This fix has been validated on T4 and Jetson Orin, ensuring that llama.cpp can now handle extensive audio data efficiently. The update retains compatibility with existing test cases, providing a more robust solution for developers working with large-scale audio processing.

llama.cpp ReleasesMay 12, 2026

More in Models & Labs

Models & Labsmodels

Thinking Machines unveils interactive AI model

Thinking Machines Lab, a startup founded by former OpenAI CTO Mira Murati, is pushing the boundaries of AI interaction with its new model, TML-Interaction-Small. This model aims to revolutionize AI communication by enabling simultaneous processing and response, akin to a natural conversation. The concept of 'full duplex' interaction could make AI feel more like a real-time dialogue rather than a series of exchanges. While the model's response time of 0.40 seconds is promising, it's still in the research phase, with a limited preview expected soon. The real test will be whether this innovation translates into a seamless user experience once publicly available.

TechCrunch AIMay 12, 2026

Models & Labsmodels

AWS Enhances Foundation Model Training Infrastructure

AWS is pushing forward its infrastructure capabilities to better accommodate the demands of foundation model training and inference, focusing on the seamless integration of open-source software frameworks. By utilizing multi-node accelerator compute, high-bandwidth networking, and distributed storage, AWS aims to overcome system bottlenecks and scaling challenges. The introduction of new EC2 instances equipped with NVIDIA GPUs, such as the P5 and P6 families, demonstrates AWS's dedication to providing substantial compute resources. These developments are crucial for machine learning engineers looking to optimize large-scale model training and inference workflows on AWS, offering enhanced efficiency and flexibility.

Hugging Face BlogMay 11, 2026

Models & Labsagents

OpenAI Launches Daybreak for Cybersecurity

OpenAI has introduced Daybreak, a new AI initiative aimed at enhancing cybersecurity by detecting and patching vulnerabilities before they can be exploited. This initiative leverages the Codex Security AI agent and integrates specialized models like GPT-5.5-Cyber to create a comprehensive threat model. Daybreak is OpenAI's response to Anthropic's Claude Mythos, which was deemed too dangerous for public release. By collaborating with industry and government partners, OpenAI aims to deploy increasingly sophisticated cyber-capable models, marking a significant step in AI-driven security solutions.

The Verge AIMay 11, 2026