Model Agnostic Knowledge Transfer Methods for Sentence Embedding Models

Gunel K., AMASYALI M. F.

2nd International Congress of Electrical and Computer Engineering, ICECENG 2023, Bandirma, Turkey, 22 - 25 November 2023, pp.3-16 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1007/978-3-031-52760-9_1
  • City: Bandirma
  • Country: Turkey
  • Page Numbers: pp.3-16
  • Keywords: FastText, Knowledge distillation, Knowledge transfer, Neural networks, sBert, Sentence embeddings
  • Yıldız Technical University Affiliated: Yes


This chapter explores the information transfer between two distinct sentence embedding models that only share a common language but not a common structure. The goal is to enhance the representational power of a weaker model by transferring knowledge from a more robust one. Despite its superior representational power, the robust model generates sentence vectors more slowly than the weaker one. This research aims to develop model-agnostic approaches to increase the weaker model’s representational power without compromising its vector generation speed. Consequently, new models will be constructed atop the existing weaker model. For this purpose, two strategies are proposed: Distance Minimization and Perplexity Minimization through a representation distillation technique. These strategies are first applied to transfer knowledge from the robust model to the weaker one using the WMT EN-ES dataset. The models are then evaluated on the SentEval datasets. This chapter also discusses the relationship between these two sentences embedding spaces based on their alignments. Our findings reveal that alignment between different embedding spaces has a significant impact on the efficiency of information transfer.