
Together AI will host MiniMax's new M3 model, offering it as an open-weights endpoint for developers. The M3 model features a 1M-token context window and supports multimodal inputs, requiring sophisticated engineering to serve efficiently. Together AI's optimizations have improved throughput by up to 125%, showcasing their ability to handle advanced AI models. This partnership underscores Together AI's role as a leading platform for deploying complex AI systems at scale.
Read originalThe v0.22.1rc2 release addresses a specific compatibility issue with CUTLASS fmin, crucial for initializing DeepSeek-V4. This fix ensures smoother integration and functionality for developers relying on this setup. While it may seem like a minor update, resolving such compatibility issues can significantly enhance the reliability and performance of AI models. This update is particularly relevant for developers working with the DeepSeek-V4 model, ensuring they can proceed without encountering initialization errors.
The b9491 release of llama.cpp resolves PDL race conditions by eliminating 'restrict' from PDL kernel headers, which were previously causing compatibility issues. This update introduces preprocessor directives to ensure performance is maintained on older architectures while simplifying the use of 'restrict' through macros. Additionally, the release addresses the PDL restrict issue on Hopper architectures. These changes are crucial for developers as they enhance compatibility and performance across different operating systems and hardware configurations, making llama.cpp more robust and versatile.