IKCEST

DeepSeek's R1 sets benchmark as first peer-reviewed major AI LLM

Sep 20, 2025

Source: https://news.cgtn.com/news/2025-09-20/DeepSeek-s-R1-sets-benchmark-as-first-peer-reviewed-major-AI-LLM-1GPnW9TOzIc/p.html

It takes around min to read this article.

/VCG

/VCG

Chinese AI start-up DeepSeek's R1 has set a new benchmark for large language models (LLM) by becoming the first of its kind to receive formal peer reviews, with the company highlighting that its innovative approach succeeded independently of competitor outputs, according to a study published Wednesday in Nature.

Released in January, R1 was purpose-built to excel in reasoning-intensive tasks, including mathematics and programming, positioning itself as a cost-effective alternative to similar tools developed by U.S. technology firms.

As an open-weight model, R1 is freely accessible for download and has become the most popular model of its kind on the AI community platform Hugging Face, with over 10.9 million downloads to date.

Nature highlighted R1 as the first major LLM to undergo formal peer review, building upon a preprint released earlier this year that detailed how DeepSeek enhanced a standard LLM to tackle complex reasoning challenges.

"This is a very welcome precedent," Nature quoted Lewis Tunstall, a machine-learning engineer at Hugging Face who reviewed the study, as saying. "If we don't have this norm of sharing a large part of this process publicly, it becomes very hard to evaluate whether these systems pose risks or not."

For the first time, supplementary material disclosed R1's training cost, totaling approximately $294,000, significantly lower than the tens of millions reportedly spent on competing models. This is in addition to roughly $6 million invested in constructing the foundational model underpinning R1.

The research describes DeepSeek's novel use of an automated kind of the trial-and-error approach known as pure reinforcement learning to create R1. The model was rewarded for accurate answers instead of merely learning from human-selected reasoning examples.

To further improve efficiency, R1 assessed its own outputs through estimation, a technique called group relative policy optimization – rather than relying on a separate algorithm.

Other researchers are now trying to apply the methods used to create R1 to improve the reasoning-like abilities of existing LLMs, as well as extend them to domains beyond mathematics and coding, says Tunstall.

In that way, he adds, R1 has "kick-started a revolution."

Original Text (This is the original text for your reference.)

/VCG

/VCG

Chinese AI start-up DeepSeek's R1 has set a new benchmark for large language models (LLM) by becoming the first of its kind to receive formal peer reviews, with the company highlighting that its innovative approach succeeded independently of competitor outputs, according to a study published Wednesday in Nature.

Released in January, R1 was purpose-built to excel in reasoning-intensive tasks, including mathematics and programming, positioning itself as a cost-effective alternative to similar tools developed by U.S. technology firms.

As an open-weight model, R1 is freely accessible for download and has become the most popular model of its kind on the AI community platform Hugging Face, with over 10.9 million downloads to date.

Nature highlighted R1 as the first major LLM to undergo formal peer review, building upon a preprint released earlier this year that detailed how DeepSeek enhanced a standard LLM to tackle complex reasoning challenges.

"This is a very welcome precedent," Nature quoted Lewis Tunstall, a machine-learning engineer at Hugging Face who reviewed the study, as saying. "If we don't have this norm of sharing a large part of this process publicly, it becomes very hard to evaluate whether these systems pose risks or not."

For the first time, supplementary material disclosed R1's training cost, totaling approximately $294,000, significantly lower than the tens of millions reportedly spent on competing models. This is in addition to roughly $6 million invested in constructing the foundational model underpinning R1.

The research describes DeepSeek's novel use of an automated kind of the trial-and-error approach known as pure reinforcement learning to create R1. The model was rewarded for accurate answers instead of merely learning from human-selected reasoning examples.

To further improve efficiency, R1 assessed its own outputs through estimation, a technique called group relative policy optimization – rather than relying on a separate algorithm.

Other researchers are now trying to apply the methods used to create R1 to improve the reasoning-like abilities of existing LLMs, as well as extend them to domains beyond mathematics and coding, says Tunstall.

In that way, he adds, R1 has "kick-started a revolution."

Comments

Something to say?

Login or Sign up for free

Disclaimer: The translated content is provided by third-party translation service providers, and IKCEST shall not assume any responsibility for the accuracy and legality of the content.

Translate engine

Article's language

English

中文

Pусск

Français

Español

العربية

Português

Kikongo

Dutch

kiswahili

هَوُسَ

IsiZulu

Action

Share

Related

Report

Select your report category *

Reason *

By pressing send, your feedback will be used to improve IKCEST. Your privacy will be protected.

Submit

Cancel

ICP备案号:京ICP备14021735号-1 © 2008 - 2023 IKCEST All rights reserved | Sitemap | Contact | About IKCEST