
The AdapTive-LeArning Speculator System (ATLAS) enhances LLM inference speed by adapting to workloads, achieving 500 TPS on DeepSeek-V3.1, which is a 4x improvement over baseline performance without manual tuning.
Read original
© Together AI BlogTogether AI and Adaption have formed a partnership to integrate Together Fine-Tuning into Adaptive Data, enabling teams to optimize datasets and deploy stronger open models.
© Together AI BlogThe latest version b8991 of llama.cpp has been released, featuring updates for various operating systems.
The latest update to llama-mmap improves compatibility with various platforms and model sizes. Key enhancements include support for 32-bit wasm and updates to gguf.cpp style.

Together AI has shut down the vulnerable crypto socket interface Copy Fail across its infrastructure to mitigate risks associated with a logic bug in the Linux kernel.