NaNs in VAC and VPG with Gaussian Policies #4

samuelfneumann · 2022-03-31T16:54:24Z

Both VAC and VPG get NaNs in the weights during training with Gaussian policies. I've done some digging, and it looks like this is caused by the standard deviation approaching 0 or infinity, so we should make sure that the standard deviation gets clamped to be between some sensible values.

Of course, there may be other issues besides this which are causing NaNs in the weights during training. Notably, this actually only happens on some runs. A lot of runs (especially with VPG) actually learn quite well, so this makes me think that the issue is caused by some numerical instabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaNs in VAC and VPG with Gaussian Policies #4

NaNs in VAC and VPG with Gaussian Policies #4

samuelfneumann commented Mar 31, 2022

NaNs in VAC and VPG with Gaussian Policies #4

NaNs in VAC and VPG with Gaussian Policies #4

Comments

samuelfneumann commented Mar 31, 2022