16 × AIAI signal, amplified
AI newsAboutSources
TelegramFollow on Telegram
AI newsAboutSources
16 × AIAI signal, amplified

An AI news engine that ingests trusted sources, scores with Claude, and posts only what clears the bar.

Follow on Telegram →

Subscribe

  • Telegram
  • RSS
  • All channels

Legal

  • Privacy
  • Imprint
© 2026 16 × AI. All rights reserved.Curated by Claude. Posts every 6 hours. No newsletter, no funnel.
Home/Research
Research

EVA-Bench Data 2.0 Expands to 213 Scenarios

Hugging Face Blog·June 4, 2026·high confidence

Why it matters

  • →Expands voice agent evaluation to cover more realistic enterprise scenarios.
  • →Ensures scenarios are challenging and fair by validating against leading AI models.
  • →Sets a new standard for reproducibility and authentication in AI benchmarks.
EVA-Bench Data 2.0 Expands to 213 Scenarios
©Hugging Face Blog

EVA-Bench Data 2.0 has expanded its evaluation scenarios from one to three enterprise domains, now including Airline Customer Service Management, Enterprise IT Service Management, and Healthcare HR Service Delivery. This update increases the number of scenarios to 213, a fourfold increase from the original release. The scenarios are validated against top models like OpenAI GPT-5.4, ensuring they are challenging and fair. This expansion enhances the dataset's realism and variety, providing a comprehensive tool for evaluating voice agents in realistic enterprise scenarios.

Read original

More from Hugging Face Blog

Nemotron 3.5 Enhances Multimodal AI Safety© Hugging Face Blog
Models & Labsmodels

Nemotron 3.5 Enhances Multimodal AI Safety

Nemotron 3.5 represents a major advancement in AI safety by integrating text, images, and responses into a single context for evaluation. This innovation effectively tackles the issue of policy violations that occur from interactions between different media types. The model's ability to enforce custom policies in real-time, thanks to its reasoning capabilities, makes it highly adaptable to various industry requirements. With its multilingual support and a comprehensive safety dataset, Nemotron 3.5 offers a robust solution for enterprises needing nuanced content moderation. This release highlights the critical role of context and customization in AI safety systems, providing enterprises with a more adaptable and accountable tool for content moderation.

Hugging Face Blog·Jun 4, 2026
Hugging Face CLI Optimized for Coding Agents© Hugging Face Blog
Coding Toolscoding

Hugging Face CLI Optimized for Coding Agents

Hugging Face has revamped its command-line interface (CLI) to better accommodate both human users and coding agents like Codex and Claude Code. The updated CLI now auto-detects when it's being used by an agent and adjusts its output format accordingly, providing a more efficient and token-light experience. This change significantly reduces the token usage for complex tasks, making it more efficient for agents to interact with the Hugging Face Hub. The CLI's new features include agent-mode output and enhanced logging methods, which streamline multi-step tasks and improve usability for both humans and agents.

Hugging Face Blog·Jun 4, 2026
DharmaOCR Uses DPO to Reduce Text Degeneration© Hugging Face Blog
Researchresearch

DharmaOCR Uses DPO to Reduce Text Degeneration

Hugging Face's DharmaOCR has demonstrated a novel application of Direct Preference Optimization (DPO) to significantly reduce text degeneration in OCR tasks. Unlike traditional supervised fine-tuning, which often fails to address degeneration directly, DPO uses the model's own degenerate outputs as negative training signals. This approach led to an average reduction in degeneration rates by 59.4%, with some cases seeing reductions as high as 87.6%. By focusing on the structural failure modes of models, DharmaOCR offers a new methodology for improving model performance in structured tasks without relying on subjective human judgments.

Hugging Face Blog·Jun 3, 2026

More in Research

NSF Renews Funding for MIT-Led AI and Physics Institute© MIT News AI
Researchresearch

NSF Renews Funding for MIT-Led AI and Physics Institute

The National Science Foundation has renewed its support for the MIT-led Institute for Artificial Intelligence and Fundamental Interactions (IAIFI), increasing its annual funding to nearly $5 million. This renewal marks a significant phase for IAIFI, which has been pioneering a model where AI and physics mutually enhance each other. The institute's work has led to breakthroughs in particle physics, nuclear physics, and astrophysics, demonstrating AI's potential to tackle complex scientific challenges. With this funding, IAIFI aims to deepen its exploration of the 'physics of AI,' fostering a community that bridges disciplines and pushes the boundaries of scientific discovery.

MIT News AI·Jun 4, 2026
Researchresearch

AI Action Plan for Biological Resilience

OpenAI has released an action plan focused on leveraging artificial intelligence to enhance biological resilience. This initiative aims to integrate AI technologies into biodefense strategies, potentially transforming how biological threats are detected and managed. By harnessing AI's predictive capabilities, the plan seeks to improve early warning systems and response mechanisms against biological hazards. This development marks a significant step in applying AI to public health and safety, offering new tools for anticipating and mitigating biological risks.

OpenAI·Jun 4, 2026
AI Agents Learn to Ask Better Questions with Games© MIT News AI
Researchresearch

AI Agents Learn to Ask Better Questions with Games

MIT and Harvard researchers have devised a method to enhance AI agents' questioning skills using the game 'Battleship'. By applying Monte Carlo inference strategies, they improved language models' ability to ask more insightful questions, leading to better performance in the game. This approach enabled smaller models like Llama 4 Scout to surpass larger models such as GPT-5 in terms of efficiency and cost-effectiveness. The research opens up possibilities for AI to navigate complex problem spaces more effectively, indicating potential applications beyond games into scientific research and coding challenges.

MIT News AI·Jun 3, 2026