
Together AI has introduced ReasonIF, a benchmark designed to assess how well large reasoning models (LRMs) follow user instructions throughout their reasoning processes. The study found that models like GPT-OSS-120B and Qwen3-235B fail to adhere to instructions more than 75% of the time, with performance degrading as task difficulty increases. ReasonIF consists of 300 math and science problems paired with specific reasoning instructions, aiming to improve controllability and transparency in model outputs. This research highlights the need for better instruction adherence in LRMs to enhance their usability and reliability.
Read original
© Together AI BlogTogether AI and Adaption have formed a partnership to integrate Together Fine-Tuning into Adaptive Data, enabling teams to optimize datasets and deploy stronger open models.
© Together AI BlogTogether AI has shut down the vulnerable crypto socket interface Copy Fail across its infrastructure to mitigate risks associated with a logic bug in the Linux kernel.