IKCEST

Abstract

A conventional decoding algorithm is critical to the success of any statistical machine translation system. Providing an enormous amount of space leads to inappropriate slow decoding. There is a trade-off between the translation accuracy and the decoding speed. Pruning algorithms (like histogram pruning, threshold pruning) are trying to optimize this. The pruning algorithm has a pre-defined limit on the supplemental parameters (i.e. stack size, beam threshold) that helps to improve the translation quality and speed up the decoder. However, the same parameter value cannot provide the qualitative translation in optimum time. These stack size and beam threshold values should be changed based on texts’ structures. In this paper, we identify the best stack size and beam threshold values runtime based on the text structure and characteristics using a machine learning-based approach. Then, the values of these parameters are applied into the beam search algorithm for decoding. Finally, our experiments on low-resourced Asian languages show significant performance improvements in terms of their translation accuracy and decoding time. The HindEnCorp and ILCI datasets are used as the benchmark datasets with English-Hindi, Hindi-Marathi, Hindi-Konkani, Bengali-Hindi language pair, for our various experiments. Moreover, we incorporate the proposed technique in cube pruning algorithm for faster decoding. We notice more improvement in this approach.

Original Text (This is the original text for your reference.)

Machine Learning Based Optimized Pruning Approach for Decoding in Statistical Machine Translation

A conventional decoding algorithm is critical to the success of any statistical machine translation system. Providing an enormous amount of space leads to inappropriate slow decoding. There is a trade-off between the translation accuracy and the decoding speed. Pruning algorithms (like histogram pruning, threshold pruning) are trying to optimize this. The pruning algorithm has a pre-defined limit on the supplemental parameters (i.e. stack size, beam threshold) that helps to improve the translation quality and speed up the decoder. However, the same parameter value cannot provide the qualitative translation in optimum time. These stack size and beam threshold values should be changed based on texts’ structures. In this paper, we identify the best stack size and beam threshold values runtime based on the text structure and characteristics using a machine learning-based approach. Then, the values of these parameters are applied into the beam search algorithm for decoding. Finally, our experiments on low-resourced Asian languages show significant performance improvements in terms of their translation accuracy and decoding time. The HindEnCorp and ILCI datasets are used as the benchmark datasets with English-Hindi, Hindi-Marathi, Hindi-Konkani, Bengali-Hindi language pair, for our various experiments. Moreover, we incorporate the proposed technique in cube pruning algorithm for faster decoding. We notice more improvement in this approach.

+More

Keywords

stack experiments machine learningbased approach text structure characteristics histogram pruning threshold pruning lowresourced asian languages englishhindi hindimarathi hindikonkani bengalihindi language benchmark datasets size hindencorp and ilci datasets statistical machine translation beam search algorithm

Cite this article

APA

MLA

Chicago

Debajyoty BanikAsif EkbalPushpak Bhattacharyya,.Machine Learning Based Optimized Pruning Approach for Decoding in Statistical Machine Translation. (),1736-1751.

Language

International

Translate engine

Article's language

Action

Recommended articles

Report