
OpenAI has unveiled its first custom application-specific integrated circuit (ASIC), dubbed 'Jalapeño'. This development marks a significant step for OpenAI as it seeks to optimize its hardware for AI workloads, potentially reducing reliance on third-party chip manufacturers. The move could enhance performance and efficiency for OpenAI's AI models.
Read original
© The AI Daily BriefAnthropic has accused Alibaba of conducting a model distillation attack on its AI models.
© The AI Daily BriefKPMG's survey reveals that CEO-led AI strategies result in three times the ROI compared to other approaches.
© The AI Daily BriefOpenAI has updated GPT-5.5 Instant, making it accessible to users on the free tier.
The latest b9784 release of llama.cpp brings significant optimizations to Hexagon's matrix multiplication capabilities. By reworking the MUL_MAT and MUL_MAT_ID operations, the update introduces a 32x32 tiled weight repack and improved kernel parameters, enhancing performance and efficiency. These changes aim to optimize register usage and streamline activation processing, particularly benefiting users leveraging Hexagon's architecture. This release doesn't introduce new models but focuses on refining existing processes, making llama.cpp more robust for developers working with diverse hardware configurations.
The latest release of llama.cpp, b9788, introduces significant improvements for dual-GPU setups with SYCL support, particularly enhancing tensor parallelism. By implementing a degenerate ring all-reduce for dual-GPU configurations, the update optimizes performance for both small and large tensor operations, mirroring CUDA's NCCL allreduce pattern. This release notably boosts performance metrics, with Llama-3.3-70B and Qwen3-Coder-Next-80B-A3B models showing substantial speed improvements. The update positions llama.cpp as a more competitive option for multi-GPU environments, without adding new dependencies or altering build configurations.
© TechCrunch AIUnconventional AI, led by former Databricks AI chief Naveen Rao, is pioneering a new computing architecture that could drastically reduce the power consumption of AI inference by up to 1,000 times. Their first model, Un-0, demonstrates the potential of an oscillator-based architecture to match the performance of state-of-the-art diffusion models in image generation. While currently running on a software simulation, the company plans to release chip schematics soon, aiming to build a complete inference stack. This innovation could address the looming energy constraints in AI scaling, offering a sustainable path forward.