
Hugging Face has released Kimina-Prover-RL, an open-source training pipeline for formal theorem proving in Lean 4. This pipeline, inspired by DeepSeek-R1, uses a structured reasoning-then-generation approach to improve model performance and explainability. Two models, AI-MO/Kimina-Prover-RL-1.7B and AI-MO/Kimina-Prover-RL-0.6B, have been released, achieving state-of-the-art results on the MiniF2F benchmark. The pipeline is fully compatible with the Verl framework, allowing for reproducibility and adaptation in theorem proving research.
Read original
© Hugging Face BlogHugging Face has introduced olmo-eval, a new evaluation workbench designed to streamline the iterative process of developing large language models (LLMs). Building on the Open Language Model Evaluation Standard (OLMES), olmo-eval offers enhanced flexibility and modularity, allowing developers to easily configure and run benchmarks across model checkpoints. Unlike traditional evaluation tools, olmo-eval supports agentic and multi-turn evaluations, providing a more nuanced analysis of model improvements. This tool is particularly useful for developers who need to quickly assess the impact of changes in data, architecture, or hyperparameters during the model development cycle.
Hugging Face's blog post dives into the profiling of PyTorch operations, focusing on the shift from basic matrix operations to using nn.Linear and constructing a Multilayer Perceptron (MLP). The article reveals how nn.Linear manages operations by integrating bias addition into the matrix multiplication kernel, effectively reducing overhead. It also examines the limited impact of torch.compile on single operations, pointing out its potential in more complex scenarios. These insights are crucial for developers aiming to optimize deep learning models on GPUs, as they provide a deeper understanding of how to maximize performance and efficiency.
The b9622 release of llama.cpp significantly boosts Vulkan capabilities, particularly for non-contiguous unary and glu operations. By refining index calculations with fastdiv and merging unary operations into a single file, the update enhances both performance and code efficiency. It also tackles a compiler bug and resolves earlier conflicts, ensuring smoother functionality across a broad spectrum of hardware setups. While this update doesn't introduce revolutionary features, it strengthens llama.cpp's role as a flexible tool for developers working with diverse hardware, including macOS, Linux, Windows, and openEuler.
The b9624 release of llama.cpp enhances its utility by introducing build-time gzip compression, which can optimize performance through reduced file sizes. This update continues to cater to developers working on various systems, including macOS, Linux, Windows, and openEuler, with specific builds for architectures like arm64 and x64. The inclusion of ROCm 7.2 for Ubuntu x64 and CUDA 12 and 13 for Windows x64 highlights its adaptability to different hardware environments. While there are no new model architectures, the release strengthens llama.cpp's role as a flexible tool for developers needing compatibility across diverse setups.
The latest b9625 release of llama.cpp continues its trend of broadening platform compatibility, though without any groundbreaking new features. Notably, it includes support for ROCm 7.2 on Ubuntu x64, which is significant for AMD GPU users seeking alternatives to NVIDIA's CUDA. The release also maintains a wide array of builds across macOS, Linux, Windows, and openEuler, though some configurations like KleidiAI on Apple Silicon remain disabled. While this update doesn't introduce new models or quantization methods, it solidifies llama.cpp's role as a versatile inference runtime across diverse systems.