Dynamic Programming vs Q-learning for Feedback Motion Planning of Manipulators

5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, HORA 2023, İstanbul, Türkiye, 8 - 10 Haziran 2023

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/hora58378.2023.10155782
Basıldığı Şehir: İstanbul
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: Dynamic Programming, Manipulators, Motion Planning, Q-Learning
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Reinforcement Learning (RL) based methods have became popular for control and motion planning of robots, recently. Unlike sampling based motion planners, optimal policies computed by them provide feedback motion plans which eliminates the need for re-computing (optimal) trajectories when a robot starts from a different initial configuration each time. In related studies, an optimal policy (actor) and the associated value function (critic) are usually calculated preforming training in a simulation environment. During training, RL allows learning by interactions with the environment in a physically realistic manner. However, in a simulation system, it is possible to make physically unimplementable moves. Thus, instead of RL, one can make use of Dynamic Programming approaches such as Value Iteration for computing optimal policies, which does not require an exploration component and known to have better convergence properties. In addition, dimension of a value function is smaller than that of a Q-fuction, thereby lessening the severity of the curse of dimensionality. Motivated by these facts, the aim of this paper is to employ Value Iteration algorithm for motion planning of robot manipulators and elaborate its effectiveness compared to a popular RL method, Q-learning.