Models & Labs

Together AI Hosts MiniMax M3 for Efficient Inference

Together AI BlogJune 2, 2026high confidence

Why it matters

→Together AI's optimizations enable efficient deployment of complex AI models like MiniMax M3.
→The M3 model's 1M-token context window and multimodal capabilities push the boundaries of AI applications.
→This collaboration demonstrates the feasibility of serving advanced AI models in real-world scenarios.

Together AI Hosts MiniMax M3 for Efficient Inference — ©Together AI Blog

Together AI will host MiniMax's new M3 model, offering it as an open-weights endpoint for developers. The M3 model features a 1M-token context window and supports multimodal inputs, requiring sophisticated engineering to serve efficiently. Together AI's optimizations have improved throughput by up to 125%, showcasing their ability to handle advanced AI models. This partnership underscores Together AI's role as a leading platform for deploying complex AI systems at scale.

Read original

Together AI Hosts MiniMax M3 for Efficient Inference

Why it matters

More from Together AI Blog

ThunderAgent Boosts Agentic Inference Efficiency

More in Models & Labs

Llama.cpp adds GLM-5.2 speculative decoding support

Llama.cpp b10178 Release Adds Trace Logging

Together AI partners with Moonshot AI for Kimi models

Together AI Enhances Model Inference Configuration

llama.cpp b10180 Release Enhances SYCL Performance