You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.
Both VAC and VPG get NaNs in the weights during training with Gaussian policies. I've done some digging, and it looks like this is caused by the standard deviation approaching 0 or infinity, so we should make sure that the standard deviation gets clamped to be between some sensible values.
Of course, there may be other issues besides this which are causing NaNs in the weights during training. Notably, this actually only happens on some runs. A lot of runs (especially with VPG) actually learn quite well, so this makes me think that the issue is caused by some numerical instabilities.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Both VAC and VPG get NaNs in the weights during training with Gaussian policies. I've done some digging, and it looks like this is caused by the standard deviation approaching 0 or infinity, so we should make sure that the standard deviation gets clamped to be between some sensible values.
Of course, there may be other issues besides this which are causing NaNs in the weights during training. Notably, this actually only happens on some runs. A lot of runs (especially with VPG) actually learn quite well, so this makes me think that the issue is caused by some numerical instabilities.
The text was updated successfully, but these errors were encountered: