Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling

KESGİN H. T., AMASYALI M. F.

2nd International Conference on Advanced Engineering, Technology and Applications, ICAETA 2023, İstanbul, Türkiye, 10 - 11 Mart 2023, cilt.1983 CCIS, ss.450-463, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 1983 CCIS
Doi Numarası: 10.1007/978-3-031-50920-9_35
Basıldığı Şehir: İstanbul
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.450-463
Anahtar Kelimeler: data augmentation, language modeling, mask filling, text augmentation
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in natural language processing (NLP) as it has in computer vision. In this paper, we propose a novel text augmentation method that leverages the Fill-Mask feature of the transformer-based BERT model. Our method involves iteratively masking words in a sentence and replacing them with language model predictions. We have tested our proposed method on various NLP tasks and found it to be effective in many cases. Our results are presented along with a comparison to existing augmentation methods. Experimental results show that our proposed method significantly improves performance, especially on topic classification datasets.