
Hugging Face has redesigned its CLI to cater to both human users and coding agents such as Codex and Claude Code. The CLI now auto-detects agent usage and optimizes output to be more efficient, reducing token usage by up to six times compared to previous methods. This update enhances the CLI's functionality, making it more suitable for complex, multi-step tasks. The changes include agent-mode output and improved logging methods, which facilitate seamless interaction with the Hugging Face Hub for both humans and agents.
Read originalNemotron 3.5 represents a major advancement in AI safety by integrating text, images, and responses into a single context for evaluation. This innovation effectively tackles the issue of policy violations that occur from interactions between different media types. The model's ability to enforce custom policies in real-time, thanks to its reasoning capabilities, makes it highly adaptable to various industry requirements. With its multilingual support and a comprehensive safety dataset, Nemotron 3.5 offers a robust solution for enterprises needing nuanced content moderation. This release highlights the critical role of context and customization in AI safety systems, providing enterprises with a more adaptable and accountable tool for content moderation.
EVA-Bench Data 2.0 significantly broadens its scope by expanding from one to three enterprise domains, covering Airline Customer Service Management, Enterprise IT Service Management, and Healthcare HR Service Delivery. This update quadruples the scenario coverage to 213, offering a robust benchmark for evaluating voice agents across diverse workflows. The scenarios are meticulously validated against leading models like OpenAI GPT-5.4 and Google Gemini 3.1 Pro, ensuring they are both challenging and fair. This release not only enhances the realism and variety of the dataset but also sets a new standard for reproducibility and authentication in voice agent evaluation.
Hugging Face's DharmaOCR has demonstrated a novel application of Direct Preference Optimization (DPO) to significantly reduce text degeneration in OCR tasks. Unlike traditional supervised fine-tuning, which often fails to address degeneration directly, DPO uses the model's own degenerate outputs as negative training signals. This approach led to an average reduction in degeneration rates by 59.4%, with some cases seeing reductions as high as 87.6%. By focusing on the structural failure modes of models, DharmaOCR offers a new methodology for improving model performance in structured tasks without relying on subjective human judgments.
The latest update to Claude Code, version 2.1.163, introduces several enhancements aimed at improving user experience and functionality. Notably, it adds managed settings to control version compatibility, ensuring users operate within approved software versions. The update also includes a new command to list installed plugins and a shortcut for copying markdown answers, which streamlines workflow. Additionally, various bug fixes address issues like terminal misalignment and session management, making the tool more reliable. These changes collectively enhance the usability and stability of Claude Code for developers.
© GitHub ChangelogGitHub has enhanced its Copilot service for Pro, Pro+, and Max subscribers by enabling automated fixes for failing Actions. With a single click, users can now delegate the task of resolving workflow failures to Copilot, which operates from a cloud-based environment. This feature allows developers to focus on more critical tasks while Copilot handles routine issues like test failures or linter errors. The integration streamlines the development process, reducing the time spent on troubleshooting and increasing productivity.
© GitHub ChangelogGitHub Copilot has introduced larger context windows and configurable reasoning levels, allowing developers to handle more complex projects with ease. The one-million-token context window supports larger codebases and multi-file projects, enhancing the tool's utility in VS Code, Copilot CLI, and the GitHub Copilot app. Configurable reasoning levels offer a balance between speed and depth, crucial for tackling architectural and debugging challenges. These features are available now, providing developers with more flexibility and power in their coding tasks. However, using these advanced features will consume more AI credits per interaction.