
/VCG
Chinese AI start-up DeepSeek's R1 has set a new benchmark for large language models (LLM) by becoming the first of its kind to receive formal peer reviews, with the company highlighting that its innovative approach succeeded independently of competitor outputs, according to a study published Wednesday in Nature.
Released in January, R1 was purpose-built to excel in reasoning-intensive tasks, including mathematics and programming, positioning itself as a cost-effective alternative to similar tools developed by U.S. technology firms.
As an open-weight model, R1 is freely accessible for download and has become the most popular model of its kind on the AI community platform Hugging Face, with over 10.9 million downloads to date.
Nature highlighted R1 as the first major LLM to undergo formal peer review, building upon a preprint released earlier this year that detailed how DeepSeek enhanced a standard LLM to tackle complex reasoning challenges.
"This is a very welcome precedent," Nature quoted Lewis Tunstall, a machine-learning engineer at Hugging Face who reviewed the study, as saying. "If we don't have this norm of sharing a large part of this process publicly, it becomes very hard to evaluate whether these systems pose risks or not."
For the first time, supplementary material disclosed R1's training cost, totaling approximately $294,000, significantly lower than the tens of millions reportedly spent on competing models. This is in addition to roughly $6 million invested in constructing the foundational model underpinning R1.
The research describes DeepSeek's novel use of an automated kind of the trial-and-error approach known as pure reinforcement learning to create R1. The model was rewarded for accurate answers instead of merely learning from human-selected reasoning examples.
To further improve efficiency, R1 assessed its own outputs through estimation, a technique called group relative policy optimization – rather than relying on a separate algorithm.
Other researchers are now trying to apply the methods used to create R1 to improve the reasoning-like abilities of existing LLMs, as well as extend them to domains beyond mathematics and coding, says Tunstall.
In that way, he adds, R1 has "kick-started a revolution."
Comments
Something to say?
Login or Sign up for free