2022 International Big Data Competition launched, teaching AI to be good translator of "The Belt and Road"

时间:2022-09-13 点击数:0

2022 International Big Data Competition launched, teaching AI to be good translator of "The Belt and Road"


On August 10, the 4th UNESCO International Knowledge Center for Engineering Science and Technology (IKCEST) "The Belt and Road" International Big Data Competition and the 8th Baidu & Xi'an Jiaotong University Big Data Competition (hereinafter referred to as "International Big Data Competition") was officially launched. This year's competition focuses on the industry challenge of "low resource language machine translation" and invites developers from all over the world to participate working on it.


Baidu releases tasks of low-resource languages, challenging machine learning puzzles


As of May 2022, "The Belt and Road" cooperation agreements signed by China have involved more than 110 languages. The cooperation in economic development and improvement of people's livelihood between countries and regions along the route is getting deeper. As a result, the demand for multilingual translation is growing rapidly.


The task of this year's competition - "The Belt and Road" low-resource language translation - focuses on the mutual translation between French, Russian, Thai and Arabic and Chinese. It breaks through the English-centric convention of international machine translation assessment, and aims to encourage young people around the world to challenge low-resource machine translation tasks and integrate into the construction of "The Belt and Road".


"Help people cross the language gap and communicate freely with the world” has always been the vision of Baidu Translation. Since Baidu started to develop machine translation in 2010, it has continued to innovate in core technologies and products such as multilingual translation and multimodal machine translation. Today, Baidu Translate has supported mutual translation of more than 200 languages, and supports text, voice, image and document translation. Baidu Translation has a daily translation volume of over 100 billion words, serving users worldwide. Wu Hua, Chairman of Baidu's Technology Committee said, "Baidu Translate is committed to helping people communicate freely with the world," which is the technical DNA of this year's competition.


Top Competition invites young dreamers to drive global interconnection


The International Big Data Competition was co-founded in 2015 by Baidu and Xi'an Jiaotong University, aiming to encourage contestants to use AI technology to solve real-world problems. In 2019, Baidu collaborated with IKCEST, Xi'an Jiaotong University and University Alliance of the Silk Road to upgrade the competition into an international event.


At the launching ceremony, Tian Qi, Director General of the International Cooperation Department of the Chinese Academy of Engineering and Executive Deputy Director of IKCEST, pointed out that this year's competition is dedicated to improving the quality of machine translation in the important languages of "The Belt and Road", which is of great significance. He expects more young dreamers to join in driving global interconnectivity and innovative development.


“The competition will bring the world closer to the goal of 'human-like machine translation'," said Narayanaswamy Balakrishnan, Fellow of the Indian Academy Institute of Science and IKCEST Governing Board Member.


Looking back on the 8-year history of the competition, Prof. Zheng Qinghua, Executive Vice President of Xi'an Jiaotong University, summarized it with "three leaps": the level leap from domestic competition to international competition, the scale leap from dozens of universities to tens of thousands of teams, and the content leap from a single AI big data algorithm to comprehensive, innovative, and design-oriented content.

Fostering creativity through competition, industry-academia-research cooperation to improve AI talent training ecology


This year's competition adopts the mainstream BLEU evaluation index for machine translation. In the preliminary round, the contestants will be given 100,000 sentence pairs for each of Chinese-French, Chinese-Russian, and Chinese-Thai translation as training data. In the semi-final round, 50,000 sentence pairs for Chinese-Arabic are used as training data. The top 16 teams in the semi-final stage will enter the final round and have project defense on site.


Baidu will provide the contestants with a benchmark model based on "PaddlePaddle" and free, high performance computing support., the well-known language service solution provider, will provide part of the corpus data. All the datasets of this year's competition will be kept open on the "Qianyan Open Source Dataset" platform ( after the competition, aiming to encourage more AI talents to dedicate to in industrial R&D and promote technological progress.


The registration deadline for the preliminary round of the International Big Data Competition is September 30, 2022, please visit the competition website for details (