New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Cherry-picking the best result #8

Open

yyrkoon27 opened this issue Dec 22, 2017 · 1 comment

yyrkoon27 commented Dec 22, 2017

Dear TA,

Consider the following two cases:

Training up to an episode (e.g. 600) when the average reward of the past 100 episodes is already 200.0
Training up to the final episode (1000) when the average reward of the past 100 episodes might be 197.0

In general, which model should we use for testing?

Thank you very much!

Member

BIGBALLON commented Dec 22, 2017

hello, @yyrkoon27

In fact, 300 epochs is enough for converging.
So you can modify the max training epoch or change the epsilon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment