Models & Labs

New Runtime-Learning Accelerator for LLM Inference

Together AI BlogOctober 10, 2025medium confidence

Why it matters

→This advancement could significantly improve the efficiency of LLM applications, making them more responsive and effective for users.

The AdapTive-LeArning Speculator System (ATLAS) enhances LLM inference speed by adapting to workloads, achieving 500 TPS on DeepSeek-V3.1, which is a 4x improvement over baseline performance without manual tuning.

Read original

More in Models & Labs

Models & Labsmodels

vLLM v0.23.0 Release Enhances Model Support

The vLLM v0.23.0 release marks a significant step forward with enhancements across various components. DeepSeek-V4 has been optimized further, decoupling its metadata from previous versions and adding new attention kernels. Model Runner V2 now supports more dense models by default, improving performance for Llama and Mistral. The Rust frontend has matured with new endpoints and tool parsers, while compatibility with Transformers v5 ensures broader model support. These updates collectively enhance the robustness and versatility of vLLM, making it a more powerful tool for developers working with large language models.

vLLM ReleasesJun 14, 2026

Models & Labsmodels

Llama.cpp b9626 Release Adds Cohere2-MoE Support

The latest b9626 release of llama.cpp introduces architectural support for the cohere2-MoE model, marking a significant update for developers working with this model. This release also includes various technical improvements such as the removal of redundant checks and enhancements in tensor handling, which streamline the model's performance. By adding cohere2moe to the Llama Model Saver supported list, the update broadens the toolkit available for AI practitioners. While these changes may seem incremental, they collectively enhance the robustness and flexibility of llama.cpp, making it a more versatile tool for AI development.

llama.cpp ReleasesJun 14, 2026

Models & Labsmodels

llama.cpp b9627 Release Expands Platform Support

The b9627 release of llama.cpp continues to enhance its platform reach, though it doesn't introduce any groundbreaking features. This update includes support for a wide array of systems, from macOS and iOS to various Linux distributions and Windows configurations, including CUDA and Vulkan support. Notably, the release maintains its focus on making llama.cpp a versatile tool across different hardware setups, but it doesn't introduce new model architectures or quantization methods. This iteration is more about solidifying its presence across multiple operating systems rather than introducing novel capabilities.

llama.cpp ReleasesJun 14, 2026