Dynamic Programming vs Q-learning for Feedback Motion Planning of Manipulators


YILDIRAN U.

5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, HORA 2023, İstanbul, Turkey, 8 - 10 June 2023 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/hora58378.2023.10155782
  • City: İstanbul
  • Country: Turkey
  • Keywords: Dynamic Programming, Manipulators, Motion Planning, Q-Learning
  • Yıldız Technical University Affiliated: Yes

Abstract

Reinforcement Learning (RL) based methods have became popular for control and motion planning of robots, recently. Unlike sampling based motion planners, optimal policies computed by them provide feedback motion plans which eliminates the need for re-computing (optimal) trajectories when a robot starts from a different initial configuration each time. In related studies, an optimal policy (actor) and the associated value function (critic) are usually calculated preforming training in a simulation environment. During training, RL allows learning by interactions with the environment in a physically realistic manner. However, in a simulation system, it is possible to make physically unimplementable moves. Thus, instead of RL, one can make use of Dynamic Programming approaches such as Value Iteration for computing optimal policies, which does not require an exploration component and known to have better convergence properties. In addition, dimension of a value function is smaller than that of a Q-fuction, thereby lessening the severity of the curse of dimensionality. Motivated by these facts, the aim of this paper is to employ Value Iteration algorithm for motion planning of robot manipulators and elaborate its effectiveness compared to a popular RL method, Q-learning.