Welcome to the IKCEST

Information Sciences | Vol., Issue. | 2021-01-31 | Pages

Information Sciences

swFLOW: A Large-Scale Distributed Framework for Deep Learning on Sunway TaihuLight Supercomputer

Li, Mingfan   Lin, Han   Lin, Rongfen   Wang, Fei   Xiao, Qian   Gao, Guang R.   An, Hong   Chen, Junshi   Diaz, Jose Monsalve  
Abstract

Deep learning technology is widely used in many modern fields and a number of models and software frameworks have been proposed. However, it is still very difficult to process deep learning tasks efficiently on traditional high performance computing (HPC) systems. In this paper, we propose swFLOW: a large-scale distributed framework for deep learning on Sunway TaihuLight. Based on the performance analysis results of convolutional neural network (CNN), we optimize the convolutional layer, and get 10.42x speedup compared to the original version. As for distributed training, we use elastic averaging stochastic gradient descent (EASGD) algorithm to reduce communication. On 512 processes, we get a parallel efficiency of 81.01% with communication period 蟿 = 8 . Particularly, a decentralized implementation of distributed swFLOW system is presented to alleviate bottleneck of the central server. By using distributed swFLOW system, we can scale the batch size up to 4096 among 1024 concurrent processes for cancerous region detection algorithm. The successful application on swFLOW reveals the great opportunity for joint combination of deep learning and HPC system.

Original Text (This is the original text for your reference.)

swFLOW: A Large-Scale Distributed Framework for Deep Learning on Sunway TaihuLight Supercomputer

Deep learning technology is widely used in many modern fields and a number of models and software frameworks have been proposed. However, it is still very difficult to process deep learning tasks efficiently on traditional high performance computing (HPC) systems. In this paper, we propose swFLOW: a large-scale distributed framework for deep learning on Sunway TaihuLight. Based on the performance analysis results of convolutional neural network (CNN), we optimize the convolutional layer, and get 10.42x speedup compared to the original version. As for distributed training, we use elastic averaging stochastic gradient descent (EASGD) algorithm to reduce communication. On 512 processes, we get a parallel efficiency of 81.01% with communication period 蟿 = 8 . Particularly, a decentralized implementation of distributed swFLOW system is presented to alleviate bottleneck of the central server. By using distributed swFLOW system, we can scale the batch size up to 4096 among 1024 concurrent processes for cancerous region detection algorithm. The successful application on swFLOW reveals the great opportunity for joint combination of deep learning and HPC system.

+More

Cite this article
APA

APA

MLA

Chicago

Li, Mingfan, Lin, Han, Lin, Rongfen, Wang, Fei, Xiao, Qian, Gao, Guang R., An, Hong, Chen, Junshi, Diaz, Jose Monsalve,.swFLOW: A Large-Scale Distributed Framework for Deep Learning on Sunway TaihuLight Supercomputer. (),.

Disclaimer: The translated content is provided by third-party translation service providers, and IKCEST shall not assume any responsibility for the accuracy and legality of the content.
Translate engine
Article's language
English
中文
Pусск
Français
Español
العربية
Português
Kikongo
Dutch
kiswahili
هَوُسَ
IsiZulu
Action
Recommended articles

Report

Select your report category*



Reason*



By pressing send, your feedback will be used to improve IKCEST. Your privacy will be protected.

Submit
Cancel