DeepSeek introduces self-learning AI models

DeepSeek is collaborating with Tsinghua University to reduce the training process of its artificial intelligence (AI) models, to reduce operational costs.

According to The Edge, the Chinese start-up that shook up the market in January with its cheap reasoning model is now working with researchers from the institution in Beijing on a paper that describes a new approach to reinforcement learning to make artificial intelligence models more efficient.

Reward for accurate answers

The researchers wrote that the new method is intended to make AI models more in line with human preferences by rewarding more accurate and comprehensible answers. Reinforcement learning has proven effective in accelerating AI tasks within limited application areas.

However, expanding it to more general applications appears to be a challenge in practice. That is the problem that the DeepSeek team is trying to solve with what they call self-principled critique tuning. According to the paper, this strategy performed better than existing methods and models on various benchmarks and led to better performance with less computing power.

DeepSeek calls these new models DeepSeek-GRM, which stands for generalist reward modeling. DeekSeek says it will make the models available on an open source basis. Other AI developers, including the Chinese technology company Alibaba Group Holding and the San Francisco-based OpenAI, are also focusing on this new frontier of reasoning ability and self-improvement of models while performing tasks in real time.

Mixture of Experts architecture

Last weekend, Meta Platforms released its newest family of AI models, Llama 4. The company noted that these are the first models to use a mixture of experts (MOE) architecture. DeepSeek’s models also extensively use MOE to use resources more efficiently. Meta compared its new release with the models of the Hangzhou start-up. DeepSeek has not yet indicated when it will release its next flagship model.

Top story

Cisco ThousandEyes: resilient networks start with global insight

Restarting the router doesn't cut it anymore

Sander Almekinders April 22, 2025

Tech career

Tech calendar

DeepSeek introduces self-learning AI models

Reward for accurate answers

Mixture of Experts architecture

Stay tuned, subscribe!

Three decades of Check Point (and cybersecurity): a conversation with Gil Shwed

Windows 11 25H2 shows signs of life: what can we expect?

Cisco ThousandEyes: resilient networks start with global insight

Veeam revamps backup infrastructure with software appliance

AI without ethics will never truly serve humanity

How do you innovate for a future you can’t entirely predict?

Meta unveils powerful open-source model Llama 3 and chatbot Meta AI

How do you roll out GenAI in enterprise environments?

Cloud Account Executive – Slack

AI & Data Architect

SAS Innovate 2025

.NEXT 2025

LambdaConf 2025

Qlik Connect 2025

Red Hat Summit

Kaseya DattoCon Europe

Try the latest high-end Synology backup system for free

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

Are you data and AI ready?