AI May 22, 2026 1 min read

Hugging Face Hub v1.16.0 Released: Powerful Support for Multimodal AI

The latest version of huggingface_hub officially integrates Together Compute as a new Inference provider, supporting five multimodal task types ranging from TTS to Text-to-Video.

Tier 1 · sources 90% confidence Reviewed

Huggingface Together Compute Multimodal Text TO Video TTS

Sources x.com

Hugging Face has just released version v1.16.0 of its core library, focusing heavily on multimodal inference capabilities with support from its partner Together Compute.

Key Developments

This update establishes Together Compute as an official Inference Provider, supporting five new task types: feature extraction, text-to-speech (TTS), automatic speech recognition (ASR), image-to-image, and notably, text-to-video generation.

Why It Matters

Integrating high-performance providers like Together Compute reduces the Hugging Face ecosystem's reliance on a single source while lowering the barrier to accessing heavy models like video generation. This is great news for Vietnamese developers building multimodal AI applications that require fast processing speeds and optimized costs.