Tech News

DeepSeek Unveils New AI Architecture to Cut Training Costs

Australian and Indian governments block DeepSeek from worker devices worldwide

Chinese artificial intelligence startup DeepSeek has released a technical paper proposing a fundamental rethink of how foundational AI models are trained. The Hangzhou-based company introduced Manifold-Constrained Hyper-Connections (pdf), a method designed to make AI model training more cost-effective while competing against better-funded American rivals with greater access to computing resources.

The research paper, released on Thursday and co-authored by founder Liang Wenfeng, builds upon existing neural network architecture concepts to address critical scalability challenges in large-scale AI training. A team of 19 DeepSeek researchers tested the new method on models ranging from 3 billion to 27 billion parameters, demonstrating that it scales effectively without adding significant computational burden.

The development marks an evolution of hyper-connections, originally proposed by ByteDance researchers in September 2024 as an improvement to ResNet, the dominant deep learning architecture introduced in 2015 by Microsoft Research Asia. ResNet has become integral to major AI systems including OpenAI’s GPT models and Google DeepMind’s AlphaFold system. However, existing approaches faced limitations in managing memory costs during large-model training.

DeepSeek’s solution adds specific constraints to the neural network structure to ensure computational and cost efficiency while maintaining the benefits of expanded residual streams and enhanced network complexity. The researchers stated that their method delivers performance gains with negligible computational overhead through efficient infrastructure-level optimizations.

The paper was uploaded to the open-access repository arXiv by Liang himself, following his pattern of personally posting DeepSeek’s most significant technical papers. This publication provides evidence that Liang remains deeply involved in core research despite maintaining a low public profile as DeepSeek gains international attention. Industry observers note that DeepSeek’s technical papers often signal engineering choices that will appear in the company’s next major model release.

Speculation is building that DeepSeek may release its next major model before the Spring Festival holiday in mid-February, potentially repeating the strategy used when it released its groundbreaking R1 model on the eve of last year’s national holiday.

Related Articles

Back to top button