ArXiv | Vol.abs/2005.01107, Issue. | 1970-01-01 | Pages
Transformer-based End-to-End Question Generation
Question Generation (QG) is an important task in Natural Language Processing (NLP) that involves generating questions automatically when given a context paragraph. While many techniques exist for the task of QG, they employ complex model architectures, extensive features, and additional mechanisms to boost model performance. In this work, we show that transformer-based finetuning techniques can be used to create robust question generation systems using only a single pretrained language model, without the use of additional mechanisms, answer metadata, and extensive features. Our best model outperforms previous more complex RNN-based Seq2Seq models, with an 8.62 and a 14.27 increase in METEOR and ROUGE L scores, respectively. We show that it also performs on par with Seq2Seq models that employ answerawareness and other special mechanisms, despite being only a single-model system. We analyze how various factors affect the model’s performance, such as input data formatting, the length of the context paragraphs, and the use of answer-awareness. Lastly, we also look into the model’s failure modes and identify possible reasons why the model fails.
Original Text (This is the original text for your reference.)
Transformer-based End-to-End Question Generation
Question Generation (QG) is an important task in Natural Language Processing (NLP) that involves generating questions automatically when given a context paragraph. While many techniques exist for the task of QG, they employ complex model architectures, extensive features, and additional mechanisms to boost model performance. In this work, we show that transformer-based finetuning techniques can be used to create robust question generation systems using only a single pretrained language model, without the use of additional mechanisms, answer metadata, and extensive features. Our best model outperforms previous more complex RNN-based Seq2Seq models, with an 8.62 and a 14.27 increase in METEOR and ROUGE L scores, respectively. We show that it also performs on par with Seq2Seq models that employ answerawareness and other special mechanisms, despite being only a single-model system. We analyze how various factors affect the model’s performance, such as input data formatting, the length of the context paragraphs, and the use of answer-awareness. Lastly, we also look into the model’s failure modes and identify possible reasons why the model fails.
+More
Select your report category*
Reason*
New sign-in location:
Last sign-in location:
Last sign-in date: