Bỏ qua đến nội dung chính
Back to home
AI 2 min read

EvolutionaryScale Launches ESMC: An AI "Map" Trained on 2.8 Billion Protein Sequences 🧬

EvolutionaryScale has announced ESMC, an open-source protein language model trained on 2.8 billion protein sequences from across Earth's entire ecosystem.

Tier 1 · sources 90% confidence Reviewed
Sources x.com

EvolutionaryScale — an AI startup founded by a team of former Meta AI researchers (the minds behind the famous ESM-1 and ESM-2 model families) — has officially announced ESMC, a powerful open-source protein language model. As part of the ESM3 ecosystem, it promises to usher in a new era of "AI Biology," where scientists can interact with the molecules of life just as we interact with text.

Key Developments

According to the development team, ESMC was trained on a massive dataset of 2.8 billion protein sequences, covering the biodiversity of all known life on Earth. Unlike standard text-based language models that only learn from written text, ESMC learns to "understand" the distribution patterns of evolutionary selections in nature over billions of years. Access to this vast amount of data allows the model to grasp the complex physical and biological laws behind how proteins form, fold, and function in cellular environments.

Notably, ESMC is a streamlined version that still retains the incredible power of the flagship ESM3 model. It focuses on predicting protein structure and function with performance that surpasses previous generations, while being optimized for the open research community to deploy on more common hardware infrastructures.

Why It Matters

This is a major step forward in the "AI for Science" era, shifting from passive observation to active creation. For researchers and biotech startups in Vietnam, models like ESMC provide a foundation to design novel proteins (de novo design) or predict biological functions with extreme accuracy.

Releasing this model to the public (specifically, the ESMC 300M-parameter version) not only accelerates the open research community's application of AI in biomedicine and drug discovery but also extends to fields like green industrial enzyme production or environmental bioremediation using microorganisms. Instead of spending years on trial-and-error in the laboratory, scientists can now "simulate" hypotheses on computers with high reliability, shortening the time it takes to bring medical innovations from the lab to patients.