Intelligent Word Embedding Methods to Support Project Proposal Grouping for Project Selection

Aksoy M. Y., AMASYALI M. F., Yanık Özbay S.

4th International Conference on Intelligent and Fuzzy Systems (INFUS), Bornova, Turkey, 19 - 21 July 2022, vol.504, pp.990-998 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 504
  • Doi Number: 10.1007/978-3-031-09173-5_113
  • City: Bornova
  • Country: Turkey
  • Page Numbers: pp.990-998
  • Keywords: Project proposal grouping, Word embedding, FastText, BERT, TF-IDF
  • Yıldız Technical University Affiliated: Yes


Project proposal selection for allocating the fund is a critical decision-making process in government/private funding agencies, universities, and research institutes. Project proposal grouping according to their similarities is an essential procedure in the project selection process and is done to simplify the work that follows, such as reviewer assignment and evaluation of projects. Current approaches to grouping proposals are primarily based on manual matching of similar topics, discipline areas, and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and takes too much time. Furthermore, because of their subjective viewpoints and potential misinterpretations, applicants frequently fail to select the correct research field or keywords for their proposals. Due to time constraints, a lack of understanding of the proposal's content, divergent perspectives, and incomplete information, proposals are mis-classified, resulting in decreased evaluation quality. This article discusses how to effectively use rich information in the abstract and title of Turkish proposals by utilizing word embedding models. In the proposed method, texts are vectorized using the FastText, BERT and TF-IDF algorithms. The presented method is validated based on the proposals submitted to the Istanbul Development Agency. Experiments indicate that generated word embeddings can effectively represent proposal texts as vectors and be used as input for clustering or classification algorithms. In this way, proposal grouping can be conducted more efficiently, accurately, and without any loss of meaning.