The b9031 release of llama.cpp focuses on optimizing backend loading processes. Developed by Adrien Gallouët from Hugging Face, the update ensures backends are only loaded when necessary, enhancing performance efficiency. This change is implemented across multiple platforms such as macOS, Linux, and Windows. While no new models are introduced, the update significantly improves the existing system's resource management, benefiting developers using llama.cpp.
Read originalThe latest b9041 release of llama.cpp continues its trend of broadening platform compatibility, making it a versatile choice for developers across different environments. Notably, this update includes support for macOS Apple Silicon with KleidiAI enabled, as well as expanded Vulkan and ROCm 7.2 support on Ubuntu. This release doesn't introduce new models but focuses on enhancing the runtime's adaptability across various hardware configurations. By doing so, llama.cpp strengthens its position as a go-to inference runtime for developers seeking flexibility beyond NVIDIA's CUDA ecosystem.
Llama.cpp's latest update expands its functionality by integrating IBM's Granite-Speech, significantly enhancing its audio processing capabilities. The update features a Conformer encoder with Shaw relative position encoding and a QFormer projector, which efficiently compresses audio data into the LLM embedding space. This ensures precise token-for-token matching with HF transformers on audio clips, demonstrating its robustness. By incorporating these advanced audio processing techniques, llama.cpp becomes a more versatile tool for developers, extending its utility beyond text to include sophisticated audio data handling.
The b9047 release of llama.cpp enhances how device memory is managed, particularly for GPUs with unknown configurations. By ensuring that memory fit for unknown GPUs is set to zero and maintaining a fallback for non-GPU devices, the update boosts stability and reliability. This release continues to support a broad array of operating systems, including macOS with KleidiAI enabled, Ubuntu with ROCm 7.2, and Windows with CUDA 12 and 13. While it doesn't introduce groundbreaking features, these refinements make llama.cpp a more dependable tool for developers working across different hardware environments.
The transition from vLLM V0 to V1 represents a major backend overhaul, prioritizing parity before modifying reinforcement learning objectives. By resolving issues such as processed rollout logprobs and runtime defaults, the vLLM team ensured that V1's outputs meet the expectations set by V0. This approach demonstrates the critical role of backend accuracy in preserving training integrity. With these adjustments, V1 now mirrors V0's behavior, creating a stable foundation for future enhancements in RL objectives without the complications of backend discrepancies.
© TechCrunch AIGenesis AI, a startup backed by Khosla Ventures, has unveiled its first full-stack robotics model, GENE-26.5, featuring human-like robotic hands. This development marks a significant step as the company aims to bridge the 'embodiment gap' in robotics by mimicking human hand functionality. The robotic hands are capable of performing complex tasks such as cooking and lab work, showcasing their potential for real-world applications. The startup's innovative approach includes a sensor-loaded glove for data collection, which could revolutionize how robots are trained. This move positions Genesis AI as a notable player in the robotics industry, with plans to expand further into general-purpose robotics.
© NVIDIA BlogNVIDIA's Spectrum-X Ethernet infrastructure is redefining AI networking with its new Multipath Reliable Connection (MRC) protocol. This innovation allows for efficient load balancing and high throughput by distributing traffic across multiple network paths, crucial for large-scale AI training. Industry leaders like OpenAI and Microsoft are already leveraging this technology to enhance their AI factories. By offering an open specification through the Open Compute Project, NVIDIA is setting a new benchmark for AI networking, ensuring resilience and efficiency at gigascale levels.