AI Jun 7, 2026 1 min read

AI Training Solutions in the Face of Labeled Data Scarcity

Lilian Weng shares how Semi-Supervised Learning and Active Learning can optimize AI model performance when data labeling budgets are limited.

Tier 1 · sources 99% confidence Reviewed

📚 Aggregated from 2 sources Lilian Weng Blog Lilian Weng Blog

As collecting high-quality labeled data becomes increasingly expensive, researcher Lilian Weng has shared an in-depth series of articles on training supervised learning models in the face of data scarcity. These methods open up new avenues to optimize costs and resources for AI development projects.

Background

According to Lilian Weng's series published in late 2021 and early 2022, the performance of supervised learning heavily relies on a large volume of high-quality labels. However, large-scale manual labeling often exceeds the budget of many research teams. To address this challenge, two main approaches are presented: Semi-Supervised Learning and Active Learning.

Progression

In the first part of the series, the author introduces Semi-Supervised Learning as a solution to fully leverage large amounts of unlabeled data combined with a small set of pre-labeled data. This method allows the model to extract useful features on its own without continuous human intervention.

In the second part, Lilian Weng shifts her analysis to Active Learning. While this method still requires human effort for additional labeling, it instructs the algorithm to proactively select the most important and informative data samples for labeling within a limited budget.

Why It Matters

For AI engineers and developers in Vietnam, optimizing the labeling process is a critical factor in reducing operational costs. Flexibly applying these two techniques not only accelerates product deployment but also reduces dependency on expensive, third-party manual labeling services.