Model Agnostic Knowledge Transfer Methods for Sentence Embedding Models


Gunel K., AMASYALI M. F.

2nd International Congress of Electrical and Computer Engineering, ICECENG 2023, Bandirma, Türkiye, 22 - 25 Kasım 2023, ss.3-16 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1007/978-3-031-52760-9_1
  • Basıldığı Şehir: Bandirma
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.3-16
  • Anahtar Kelimeler: FastText, Knowledge distillation, Knowledge transfer, Neural networks, sBert, Sentence embeddings
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

This chapter explores the information transfer between two distinct sentence embedding models that only share a common language but not a common structure. The goal is to enhance the representational power of a weaker model by transferring knowledge from a more robust one. Despite its superior representational power, the robust model generates sentence vectors more slowly than the weaker one. This research aims to develop model-agnostic approaches to increase the weaker model’s representational power without compromising its vector generation speed. Consequently, new models will be constructed atop the existing weaker model. For this purpose, two strategies are proposed: Distance Minimization and Perplexity Minimization through a representation distillation technique. These strategies are first applied to transfer knowledge from the robust model to the weaker one using the WMT EN-ES dataset. The models are then evaluated on the SentEval datasets. This chapter also discusses the relationship between these two sentences embedding spaces based on their alignments. Our findings reveal that alignment between different embedding spaces has a significant impact on the efficiency of information transfer.