
Hugging Face's vLLM has transitioned from version 0 to version 1, focusing on backend correctness before altering reinforcement learning objectives. The team identified and corrected issues such as processed rollout logprobs and runtime defaults, ensuring that V1's outputs align with the V0 reference. This approach underscores the importance of backend accuracy in maintaining training consistency. With these corrections, vLLM V1 now matches V0's behavior, paving the way for future improvements in RL objectives.
Read originalThe latest b9041 release of llama.cpp continues its trend of broadening platform compatibility, making it a versatile choice for developers across different environments. Notably, this update includes support for macOS Apple Silicon with KleidiAI enabled, as well as expanded Vulkan and ROCm 7.2 support on Ubuntu. This release doesn't introduce new models but focuses on enhancing the runtime's adaptability across various hardware configurations. By doing so, llama.cpp strengthens its position as a go-to inference runtime for developers seeking flexibility beyond NVIDIA's CUDA ecosystem.
Llama.cpp's latest update expands its functionality by integrating IBM's Granite-Speech, significantly enhancing its audio processing capabilities. The update features a Conformer encoder with Shaw relative position encoding and a QFormer projector, which efficiently compresses audio data into the LLM embedding space. This ensures precise token-for-token matching with HF transformers on audio clips, demonstrating its robustness. By incorporating these advanced audio processing techniques, llama.cpp becomes a more versatile tool for developers, extending its utility beyond text to include sophisticated audio data handling.
The llama.cpp b9049 release marks a notable step forward by integrating MiniCPM-V 4.6, enhancing the tool's capabilities for developers. This version addresses several bugs and refines features, such as implementing build_attn for flash attention support and improving code style and type checks. The update also extends its reach across various platforms, including macOS, Linux, and Windows, with tailored support for Apple Silicon and Vulkan. These enhancements make llama.cpp a more versatile and reliable tool for developers working with a range of AI models, boosting its performance and usability.