Cyclical Curriculum Learning

KESGİN, Himmet; AMASYALI, Mehmet

doi:10.1109/tnnls.2023.3265331

Cyclical Curriculum Learning

KESGİN H. T., AMASYALI M. F.

IEEE Transactions on Neural Networks and Learning Systems, cilt.35, sa.9, ss.12864-12872, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 35 Sayı: 9
Basım Tarihi: 2024
Doi Numarası: 10.1109/tnnls.2023.3265331
Dergi Adı: IEEE Transactions on Neural Networks and Learning Systems
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Biotechnology Research Abstracts, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, EMBASE, INSPEC, MEDLINE, Metadex, Civil Engineering Abstracts
Sayfa Sayıları: ss.12864-12872
Anahtar Kelimeler: Curriculum learning (CL), deep learning, optimization
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Artificial neural networks (ANNs) are inspired by human learning. However, unlike human education, classical ANN does not use a curriculum. Curriculum learning (CL) refers to the process of ANN training in which samples are used in a meaningful order. When using CL, training begins with a subset of the dataset and new samples are added throughout the training, or training begins with the entire dataset and the number of samples used is reduced. With these changes in training dataset size, better results can be obtained with curriculum, anti-curriculum, or random-curriculum methods than the vanilla method. However, a generally efficient CL method for various architectures and datasets is not found. In this article, we propose cyclical CL (CCL), in which the data size used during training changes cyclically rather than simply increasing or decreasing. Instead of using only the vanilla method or only the curriculum method, using both methods cyclically like in CCL provides more successful results. We tested the method on 18 different datasets and 15 architectures in image and text classification tasks and obtained more successful results than no-CL and existing CL methods. We also have shown theoretically that it is less erroneous to apply CL and vanilla cyclically instead of using only CL or only the vanilla method. The code of the cyclical curriculum is available at https://github.com/CyclicalCurriculum/Cyclical-Curriculum.