Bỏ qua đến nội dung chính
Back to home
AI tools-ai 1 min read

Hugging Face launches Delta Weight Sync to optimize RL training ⚡

Hugging Face's new Delta Weight Sync technology reduces RL model transmission size by up to 98% by only sending modified weights via cloud storage.

Tier 1 · sources 99% confidence Auto-priority
Sources huggingface.co

Hugging Face has just announced the Delta Weight Sync solution integrated into the TRL (Transformer Reinforcement Learning) library, addressing the data transmission bottleneck in asynchronous reinforcement learning (RL) training. This technology allows systems to transmit only the weights that have changed between training steps, significantly reducing GPU idle time.

Background

In traditional asynchronous RL training, the trainer must send the entire model to the inference engine after each optimization step. For a 7-billion parameter (7B) model, this payload reaches up to 14 GB, and can scale up to 1 TB for ultra-large models. According to Hugging Face, this network bottleneck forces inference GPUs to stall while waiting, resulting in a severe waste of compute resources.

Developments

Drawing from empirical research by PULSE and Fireworks AI, Hugging Face discovered that approximately 99% of bf16-formatted weights do not change at all between two consecutive RL optimization steps due to the rounding mechanism. The Delta Weight Sync solution leverages this characteristic by encoding only the changed elements into an ultra-small "sparse safetensors" file, which is then uploaded to a Hugging Face Bucket. In real-world tests with the Qwen3-0.6B model, the transmission volume per step plummeted from 1.2 GB to only about 20 to 35 MB.

Why it matters

This new method opens up opportunities to train large AI models in a distributed fashion without requiring expensive supercomputer infrastructure or dedicated RDMA networks. For the first time, developers in Vietnam can run the trainer on a personal computer and deploy inference replicas (rollout servers) on affordable Hugging Face Spaces, interconnecting them through a single cloud storage system.