The b9784 release of llama.cpp focuses on optimizing Hexagon's matrix multiplication operations. Key improvements include a rework of the MUL_MAT and MUL_MAT_ID functions, introducing a 32x32 tiled weight repack and enhanced kernel parameters. These updates aim to improve performance and efficiency, particularly for users utilizing Hexagon's architecture. The release does not feature new models but enhances existing processes, making llama.cpp more efficient for developers across various hardware setups.
Read originalThe latest b9781 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While KleidiAI support for macOS Apple Silicon is disabled, the release still covers a wide array of platforms, including Windows and openEuler. This update reinforces llama.cpp's position as a versatile inference runtime, though it remains focused on platform expansion rather than introducing new model architectures.
The latest b9782 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While KleidiAI support for Apple Silicon remains disabled, the release still covers a wide array of platforms, from Windows to openEuler. This update solidifies llama.cpp's position as a versatile inference runtime, though it doesn't introduce groundbreaking changes.
The latest b9785 release of llama.cpp continues its trend of broadening platform compatibility, though without major new features. Notably, the release includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. While KleidiAI support for Apple Silicon remains disabled, the release still covers a wide array of platforms, from macOS to Windows and openEuler. This update solidifies llama.cpp's position as a versatile inference runtime, though it doesn't introduce groundbreaking changes.
© The AI Daily BriefOpenAI has announced the development of its first custom chip, named 'Jalapeño'.
© The AI Daily BriefOpenAI has updated GPT-5.5 Instant, making it accessible to users on the free tier.
© TechCrunch AIUnconventional AI, led by former Databricks AI chief Naveen Rao, is pioneering a new computing architecture that could drastically reduce the power consumption of AI inference by up to 1,000 times. Their first model, Un-0, demonstrates the potential of an oscillator-based architecture to match the performance of state-of-the-art diffusion models in image generation. While currently running on a software simulation, the company plans to release chip schematics soon, aiming to build a complete inference stack. This innovation could address the looming energy constraints in AI scaling, offering a sustainable path forward.