In the training phase of Deep Deterministic Policy Gradient (DDPG) algorithm, the action selection would be simply
action = actor(state) where state is the current state of the environment and actor is a deep neural network.
I do not understand how to guarantee that the returned action belongs to the action space of the considered environment.
For example, a state could be a vector of size 4 and the action space could be the interval [-1,1] of real numbers or the Cartesian product of [-1,1]x[-2,2]. Why, after doing action = actor(state), the returned action would belong to [-1,1] or [-1,1]x[-2,2], depending on the environment?
I was reading some source codes of DDPG on GitHub but I am missing something here and I cannot figure out the answer.
https://stackoverflow.com/questions/65556692/how-to-guarantee-that-the-actor-would-select-a-correct-action January 04, 2021 at 09:34AM
没有评论:
发表评论