
Together AI introduces a cache-aware architecture called CPD that improves throughput by 40% for long-context LLM serving by separating warm and cold inference workloads.
Read original
© Together AI BlogTogether AI and Adaption have formed a partnership to integrate Together Fine-Tuning into Adaptive Data, enabling teams to optimize datasets and deploy stronger open models.
© Together AI BlogThe latest version b8991 of llama.cpp has been released, featuring updates for various operating systems.
The latest update to llama-mmap improves compatibility with various platforms and model sizes. Key enhancements include support for 32-bit wasm and updates to gguf.cpp style.

Together AI has shut down the vulnerable crypto socket interface Copy Fail across its infrastructure to mitigate risks associated with a logic bug in the Linux kernel.