Satellite fault tolerant attitude control based on expert guided exploration of reinforcement learning agent

Henna, Hicham; Toubakh, Houari; Kafi, Mohamed; Gürsoy, Ömer; Sayed-Mouchaweh, Moamar; Djemai, Mohamed

doi:10.1080/0952813x.2024.2321152

Satellite fault tolerant attitude control based on expert guided exploration of reinforcement learning agent

Henna H., Toubakh H., Kafi M. R., Gürsoy Ö., Sayed-Mouchaweh M., Djemai M.

Journal of Experimental and Theoretical Artificial Intelligence, cilt.37, sa.6, ss.987-1011, 2025 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 37 Sayı: 6
Basım Tarihi: 2025
Doi Numarası: 10.1080/0952813x.2024.2321152
Dergi Adı: Journal of Experimental and Theoretical Artificial Intelligence
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Compendex, Computer & Applied Sciences, INSPEC, Psycinfo, zbMATH
Sayfa Sayıları: ss.987-1011
Anahtar Kelimeler: Fault-tolerant control, machine learning, policy gradient algorithms, reinforcement learning, reward shaping
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

This research provides a method that accelerates learning and avoids local minima to improve the policy gradient algorithm’s learning process. Reinforcement learning has the advantage of not requiring a model. Consequently, it can improve control performance, mainly when a model is generally unavailable, such as when an error occurs. The proposed method efficiently and expeditiously investigates the action space. First, it quantifies the resemblance between agents’ and traditional controllers’ actions. Then, the principal reward function is modified to reflect this similarity. This reward-shaping mechanism guides the agent to maximize its return via an attractive force during the gradient ascent. To validate our concept, we establish a satellite attitude control environment with a similarity subsystem. The outcomes demonstrate the effectiveness and robustness of our method.