Kalera News (English)

Kalera News (English)AI, Robotics and Tech news in English — source-backed.https://news.kalera.ai/en-US© 2026 Kalera NewsSapient Claims to Train Foundation AI Model from Scratch for Just $1,500: Breaking the Million-Dollar Barrier? 💰🤯https://news.kalera.ai/en/articles/train-foundation-model-from-scratch-1500/https://news.kalera.ai/en/articles/train-foundation-model-from-scratch-1500/Sapient claims to have successfully trained a foundation model from scratch for just $1,500, leveraging a breakthrough HRM-Text architecture to drastically reduce financial barriers and empower enterprises to build specialized AI. 💡Sun, 14 Jun 2026 11:35:23 GMTTraining a foundation Large Language Model (LLM) from scratch typically costs millions of dollars and requires massive datasets, deterring the vast majority of enterprises. However, Sapient, a tech company, claims to have found a much more economical path, with training costs totaling just **$1,500**. 💰 ### Expensive Training Bottleneck: The Root Cause 📉 The current paradigm for training LLMs involves scraping the entire internet and predicting trillions of tokens, hoping the model develops a deep understanding of language and reasoning. However, researchers describe this approach as "prohibitively expensive." * **Guan Wang, CEO of Sapient Intelligence**, emphasizes that this is a "problem of recurring economics": "Enterprises today face three overlapping problems: expensive training, heavy infrastructure, and slow experimental cycles. The industry's 'addictive' reaction to scaling has been: 'When the model fails, make it bigger. Add more data. Add more GPUs.' This has worked, but we are reaching points of diminishing returns. Scaling often means more memorization, higher latency, bulkier footprint, and vendor lock-in. It does not necessarily yield a better reasoning machine for a business." Fine-tuning existing Transformer models is not always the optimal solution either, as it still requires significant generalized data, making it costly and difficult to control. ### HRM-Text: Rethinking AI Architecture From Scratch 🧠 To break this "dogma" of brute-force scaling, researchers at Sapient developed **HRM-Text**. This architecture replaces traditional Transformers with a highly sample-efficient **Hierarchical Recurrent Model (HRM)**, which was first introduced last year. The HRM architecture decouples computation into two distinct layers: * A slowly evolving **strategic layer**: Maintains a stable semantic context. * A rapidly evolving **execution layer**: Performs localized, iterative refinement. Crucially, instead of predicting the next token on raw web text, HRM-Text is trained **exclusively on instruction-response pairs**. This closely mirrors the real-world enterprise setting, where users expect a specific answer to a specific task. To address the mathematical instability challenges of applying HRMs to the complexity of natural language, the researchers introduced two key architectural innovations in HRM-Text: 1. **MagicNorm:** A specialized normalization technique designed to keep internal signals stable, no matter how many times the model repeats its reasoning cycle. 2. **Warm-up method:** During the initial phase of training, the model is evaluated on short, shallow reasoning loops. As training progresses, the system gradually feeds the model deeper and longer reasoning chains. The training objective is also shifted from token prediction to **task completion**, where the model is only rewarded for the final response as a whole, rather than on a per-token basis. ### HRM-Text in Practice: Astonishing Results 📊 The researchers built a compact HRM-Text model with **1 billion parameters**. Instead of digesting trillions of words of raw internet text, they trained this model from scratch on a highly curated dataset of just **40 billion tokens**. The training data consisted entirely of instruction-response pairs across multiple domains: general instruction, mathematics, symbolic logic, textbook exercises, and rewritten knowledge. * The model was trained in **under 2 days (1.9 days)** on a cluster of 16 GPUs. * The total computational cost was estimated at just **$1,500**. 💰 * It achieved highly competitive scores on major industry benchmarks: **60.7% on MMLU, 84.5% on GSM8K, and 56.2% on MATH**. * Notably, HRM-Text achieved these scores using **100x to 900x fewer training tokens** and an estimated **96x to 432x lower compute cost** than models like Qwen, Gemma, and Llama. This proves that a model does not need to "memorize the entire internet" to become an intelligent reasoning engine. HRM-Text succeeded in heavy reasoning tasks despite being trained on only 40 billion tokens. ### Business Implications: Strategy Over Infrastructure 🚀 For real-world AI applications, this means pre-training a foundation model is no longer restricted to resource-rich organizations. With HRM-Text, enterprises can: * **Affordably pre-train** highly capable reasoning models from scratch. * **Decouple** them from external knowledge bases instead of cramming all company data into the model weights. * Build a compact, specialized "reasoning core" for their business logic, running in a controlled environment. Wang rebuts arguments that comparing models trained on instruction-response pairs to those trained on raw text is "unfair." He argues that every modern LLM processes instruction-response data during training or alignment. "So, the comparison is not apples to oranges. It is closer to apple cores to apples. We start directly from the core task format because that is how people actually use the model: they give an instruction and expect a helpful response," he said. ### Critiques and the Future of Enterprise AI 💡 While the benchmarks and cost-efficiency are impressive, Sapient made the model's current limits clear. The initial release is primarily a proof-of-concept, similar to early GPT releases, aimed at showcasing the unique architectural advantages. * "To be completely frank, HRM-Text is not a plug-and-play ChatGPT replacement yet," Wang shared. "It is a compact foundation language-reasoning model. For an enterprise engineering team, the operational work is mostly around templates, mode selection, attention masking, and alignment." He concluded: "When the cost of training a capable reasoning model drops to $1,500, AI stops being an infrastructure question and becomes a strategic one. A Fortune 500 company no longer has to ask, 'Can we afford to train a foundation model?' Instead, they will ask, 'What should our model know about our business, and what kind of reasoning should it be optimized for?'" This opens a new era for enterprise-grade AI, where innovation is no longer gatekept by astronomical costs.2026-06-14T11:35:23.908ZenAItools-ai