LLM Prompting Versus Fine-Tuning PLMs: A Comparative Study on Keyword Generation from Customer Feedback

20th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2024, Corfu, Yunanistan, 27 - 30 Haziran 2024, cilt.712, ss.88-99

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 712
Doi Numarası: 10.1007/978-3-031-63215-0_7
Basıldığı Şehir: Corfu
Basıldığı Ülke: Yunanistan
Sayfa Sayıları: ss.88-99
Anahtar Kelimeler: Customer Feedback, Keyword Generation, Large Language Models, Natural Language Processing, Pre-trained Language Models, Tourism
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

This study focuses on keyword generation of customer feedback analysis within the tourism sector regarding tour and hotel services provided by Setur. A dataset comprising 1000 customer surveys from 2020–2022 was crafted by annotating keywords gleaned from open-ended questions. The research employs the efficacy of fine-tuning Pre-trained Language Models (PLMs) and employing Large Language Models (LLMs) through prompting for keyword generation. The study navigates through traditional statistical methods, such as TF-IDF and KP-MINER, and contemporary approaches, such as YAKE, PageRank, and TextRank. Additionally, it explores supervised methodologies, including KEA and sequence-to-sequence models, juxtaposed against the surge in popularity of pre-trained language models like T5, Bart, GPT3.5, GPT4, and Gemini. In our experimental approach, multilingual versions of T5 and Bart models, namely MT5 and MBART, were fine-tuned to prior studies. Comparison extended to include GPT3.5, GPT4, and Gemini models, utilizing diverse prompt styles, and introducing few-shot examples to discern changes in performance. The semantic similarity evaluations between generated keywords and the source text, text length metrics, and inter-keyword semantic similarity are presented. Along with MBART and MT5, Turkish language models TRBART and TRT5 were employed. The study not only aims to contribute insights into the domain of customer feedback analysis but also to serve as a benchmark for comparing the efficiency of PLM fine-tuning and LLM prompting in keyword generation from textual data.