
Hugging Face has conducted a study comparing the performance of hybrid language models to traditional transformers, focusing on token-level predictions. The Olmo Hybrid model demonstrated superior performance in predicting meaningful tokens like nouns and verbs, while transformers excelled in handling repetitive tokens due to their attention mechanisms. This research suggests that evaluating models based on specific token types can reveal architectural strengths and guide the development of more effective hybrid models. The findings are expected to inform future hybrid modeling efforts.
Read originalHugging Face has streamlined the process of deploying a vLLM server with a single command, making it easier for developers to test and evaluate models. By using the official vllm/vllm-openai image and specifying a GPU flavor, users can quickly set up a server for model inference. This approach allows for flexible scaling, accommodating larger models by adjusting GPU resources and parallel processing settings. The integration with Hugging Face's infrastructure simplifies access and management, providing a practical solution for developers needing quick, temporary model deployments.
© Hugging Face BlogNVIDIA's NeMo AutoModel is making waves by significantly accelerating the fine-tuning of Transformers, particularly for Mixture of Experts (MoE) models. By integrating Expert Parallelism and DeepEP fused dispatch, it achieves up to 3.7x higher training throughput and reduces GPU memory usage by up to 32% compared to native Transformers v5. This is achieved without altering the existing from_pretrained() API, making it accessible for developers already familiar with Hugging Face models. The innovation lies in its ability to scale efficiently across multiple GPUs, offering a seamless transition for those looking to optimize large-scale AI models.
Hugging Face and Treble Technologies have unveiled the FFASR Leaderboard, a pioneering benchmark for assessing automatic speech recognition (ASR) models in realistic far-field acoustic settings. This initiative tackles the discrepancy between traditional benchmarks and actual performance, where elements like reverberation and ambient noise significantly affect model accuracy. By offering a community-driven platform, the leaderboard promotes the creation of models that can withstand these challenging conditions. This development is poised to redirect focus towards enhancing real-world acoustic robustness, providing a more precise evaluation of ASR model performance in complex acoustic scenarios.
© Matt WolfeAn MIT study finds that combining human skills with AI leads to better performance than relying on human skills alone.
© Microsoft ResearchMicrosoft Research, in collaboration with several universities, has developed a framework called generative causal testing (GCT) to make AI-driven brain prediction models more interpretable. GCT translates complex models into concise explanations of what specific brain regions respond to, such as 'food preparation' or 'location names.' This method not only predicts brain activity but also tests these predictions by generating stories that activate targeted brain areas. The approach has revealed new insights into brain function, including previously unknown prefrontal micro-regions. This advancement bridges the gap between predictive models and scientific understanding, offering a new way to explore the brain's response to language.
© MIT News AIMIT and Microsoft have developed a system called Murakkab that optimizes AI agent workflows, significantly reducing energy use and costs. By allowing developers to describe workflows in plain language, Murakkab automatically selects the best models and tools, dynamically adjusting configurations to meet user priorities like speed or cost. This innovation addresses inefficiencies in agentic workflows, which are crucial for cloud providers. The system's ability to adapt to new models and hardware without manual reconfiguration marks a significant advancement in AI deployment efficiency.