2021年1月3日星期日

How to guarantee that the actor would select a correct action?

In the training phase of Deep Deterministic Policy Gradient (DDPG) algorithm, the action selection would be simply

action = actor(state)  

where state is the current state of the environment and actor is a deep neural network.

I do not understand how to guarantee that the returned action belongs to the action space of the considered environment.

For example, a state could be a vector of size 4 and the action space could be the interval [-1,1] of real numbers or the Cartesian product of [-1,1]x[-2,2]. Why, after doing action = actor(state), the returned action would belong to [-1,1] or [-1,1]x[-2,2], depending on the environment?

I was reading some source codes of DDPG on GitHub but I am missing something here and I cannot figure out the answer.

https://stackoverflow.com/questions/65556692/how-to-guarantee-that-the-actor-would-select-a-correct-action January 04, 2021 at 09:34AM

没有评论:

发表评论