Q-Learning with probability based action policy

Ugurlu E. S. , BİRİCİK G.

IEEE 14th Signal Processing and Communications Applications, Antalya, Turkey, 16 - 19 April 2006, pp.210-211 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu.2006.1659880
  • City: Antalya
  • Country: Turkey
  • Page Numbers: pp.210-211


In Q-learning, the aim is to reach the goal by using state and action pairs. When the goal is set as a big reward, the optimal path is found as soon as the reward accumulated reaches its highest value. Upon modification of the start and goal points, the information concerning how to reach the goal becomes useless even if the environment does not change. In this study, Q-learning is improved by making the usage of the past data possible. To achieve this, action probabilities for certain start and goal points are found and a neural network is trained with those values to estimate the action probabilities for other start and goal points. A radial basis function network is used as neural network for it can support local representation and can learn fast when there is a few number of inputs. When Q-learning is run with the found action probabilities, an increase in speed is observed in reaching the goal.