Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-picking the best result #8

Open
yyrkoon27 opened this issue Dec 22, 2017 · 1 comment
Open

Cherry-picking the best result #8

yyrkoon27 opened this issue Dec 22, 2017 · 1 comment

Comments

@yyrkoon27
Copy link

Dear TA,

Consider the following two cases:

  1. Training up to an episode (e.g. 600) when the average reward of the past 100 episodes is already 200.0
  2. Training up to the final episode (1000) when the average reward of the past 100 episodes might be 197.0

In general, which model should we use for testing?

Thank you very much!

@BIGBALLON
Copy link
Member

hello, @yyrkoon27

In fact, 300 epochs is enough for converging.
So you can modify the max training epoch or change the epsilon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants