The b9025 release of llama.cpp has been announced, featuring expanded support across multiple platforms. This update includes Vulkan support for both Ubuntu and Windows, as well as ROCm 7.2 for Ubuntu, enhancing GPU performance capabilities. While no new models are introduced, the release focuses on broadening compatibility, making llama.cpp a versatile option for developers working on various hardware configurations. This positions llama.cpp as a flexible tool for developers seeking to implement AI solutions across different systems.
Read originalThe b9015 release of llama.cpp marks another step in expanding its reach across diverse systems, now including macOS Apple Silicon with KleidiAI enabled and Ubuntu with ROCm 7.2. This update also brings Vulkan support to both Linux and Windows, enhancing the software's versatility. Windows users benefit from CUDA 12 and 13 support, ensuring compatibility with the latest NVIDIA technologies. While the release doesn't introduce new model architectures, it strengthens llama.cpp's role as a flexible inference runtime for developers working with varied hardware configurations.
The b9018 release of llama.cpp continues its trend of broadening platform compatibility, now supporting a wide array of systems including macOS, Linux, Windows, and Android. Notably, it introduces Vulkan support on Ubuntu and Windows, and adds ROCm 7.2 for AMD GPUs, which is a significant step for users seeking alternatives to NVIDIA's CUDA. This release doesn't bring new models or quantization methods, but it solidifies llama.cpp's position as a versatile inference runtime across diverse hardware configurations. Users can now leverage these enhancements to optimize performance on their specific setups.
The b9019 release of llama.cpp brings notable changes by relocating functions like load_hparams and load_tensors to be defined per model, enhancing the flexibility for developers. This structural shift is complemented by the introduction of build_graph and refined switch case logic, which collectively improve the system's modularity. These updates facilitate easier adaptation to various hardware setups, including macOS, Linux, and Windows environments. Although no new model architectures are introduced, the release sets a foundation for more efficient development and deployment, particularly with support for configurations like KleidiAI on Apple Silicon and ROCm 7.2 on AMD GPUs.
© Google AI BlogGoogle's Cloud Next '26 event showcased significant advancements in AI, emphasizing the 'agentic era' with the launch of the Gemini Enterprise Agent Platform and eighth-generation TPUs. These innovations aim to enhance business operations and energy efficiency in data centers. The introduction of Gemma 4, an open model for advanced reasoning, and Deep Research Max, which automates high-level research tasks, marks a leap in AI capabilities. Additionally, Google Vids now offers free video generation, democratizing access to professional-quality content creation. These developments highlight Google's commitment to integrating AI into diverse sectors, from education to enterprise solutions.
© Google AI BlogGoogle's Gemini API now supports event-driven Webhooks, significantly reducing friction and latency for long-running tasks. This new feature allows developers to receive real-time notifications when a job is completed, eliminating the need for continuous polling. The implementation adheres to the Standard Webhooks specification, ensuring secure and reliable communication with features like signed requests and automatic retries. This advancement makes it easier for developers to manage complex workflows, such as deep research or batch processing, with greater efficiency.
The latest release of vLLM, version 0.20.2rc0, brings a new shutdown() method, enhancing the control developers have over the lifecycle of their applications. This addition is a practical improvement for those managing resources and ensuring clean exits in their AI systems. While it may seem like a small update, it reflects a focus on robustness and reliability in AI infrastructure. Developers can now better manage their applications, reducing potential issues during shutdown processes.