-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RL Training pipeline on 5-min data #1415
Conversation
…provider should be reverted before merging.
qlib/rl/trainer/trainer.py
Outdated
@@ -218,6 +223,7 @@ def fit(self, vessel: TrainingVesselBase, ckpt_path: Path | None = None) -> None | |||
with _wrap_context(vessel.train_seed_iterator()) as iterator: | |||
vector_env = self.venv_from_iterator(iterator) | |||
self.vessel.train(vector_env) | |||
del vector_env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my experiments, memory will not be normally released after each training round without these explicit del
operation, and that will cause OOM finally. I am not 100% sure about the mechanism here, but the del
does work. Any better ideas?
CC @you-n-g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lihuoran
Then please add some comments here.
(It is a little counterintuitive based on my understanding of the mechanism of Python.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some testing on this. Memory and subprocess leak only reproducible when using GPU.
qlib/rl/order_execution/reward.py
Outdated
@@ -21,10 +21,13 @@ class PAPenaltyReward(Reward[SAOEState]): | |||
---------- | |||
penalty | |||
The penalty for large volume in a short time. | |||
zoom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest "scale"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lihuoran Why is zoom
/ scale
necessary if we have hyperparameters like learning rate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be renamed to "scale".
@you-n-g In the AAAI 2021's open-source project, they use a scaled reward. I personally think adding this parameter would be more flexible.
qlib/rl/order_execution/reward.py
Outdated
@@ -21,10 +21,13 @@ class PAPenaltyReward(Reward[SAOEState]): | |||
---------- | |||
penalty | |||
The penalty for large volume in a short time. | |||
zoom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lihuoran Why is zoom
/ scale
necessary if we have hyperparameters like learning rate?
@lihuoran Could you please check the CI errors? |
* Workflow runnable * CI * Slight changes to make the workflow runnable. The changes of handler/provider should be reverted before merging. * Train experiment successful * Refine handler & provider * CI issues * Resolve PR comments * Resolve PR comments * CI issues * Fix test issue * Black
* Workflow runnable * CI * Slight changes to make the workflow runnable. The changes of handler/provider should be reverted before merging. * Train experiment successful * Refine handler & provider * CI issues * Resolve PR comments * Resolve PR comments * CI issues * Fix test issue * Black
* Workflow runnable * CI * Slight changes to make the workflow runnable. The changes of handler/provider should be reverted before merging. * Train experiment successful * Refine handler & provider * CI issues * Resolve PR comments * Resolve PR comments * CI issues * Fix test issue * Black
Description
Motivation and Context
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.py
under upper directory ofqlib
.Screenshots of Test Results (if appropriate):
Types of changes