RL Training pipeline on 5-min data #1415

lihuoran · 2023-01-11T03:42:36Z

Description

Motivation and Context

How Has This Been Tested?

Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

Pipeline test:
Your own tests:

Types of changes

Fix bugs
Add new feature
Update documentation

…provider should be reverted before merging.

ultmaster · 2023-01-12T02:43:43Z

qlib/rl/trainer/trainer.py

@@ -218,6 +223,7 @@ def fit(self, vessel: TrainingVesselBase, ckpt_path: Path | None = None) -> None
            with _wrap_context(vessel.train_seed_iterator()) as iterator:
                vector_env = self.venv_from_iterator(iterator)
                self.vessel.train(vector_env)
+                del vector_env


According to my experiments, memory will not be normally released after each training round without these explicit del operation, and that will cause OOM finally. I am not 100% sure about the mechanism here, but the del does work. Any better ideas?
CC @you-n-g

@lihuoran
Then please add some comments here.
(It is a little counterintuitive based on my understanding of the mechanism of Python.)

I did some testing on this. Memory and subprocess leak only reproducible when using GPU.

qlib/rl/trainer/trainer.py

qlib/rl/trainer/callbacks.py

ultmaster · 2023-01-12T02:44:32Z

qlib/rl/order_execution/reward.py

@@ -21,10 +21,13 @@ class PAPenaltyReward(Reward[SAOEState]):
    ----------
    penalty
        The penalty for large volume in a short time.
+    zoom


Suggest "scale"

@lihuoran Why is zoom/ scale necessary if we have hyperparameters like learning rate?

Will be renamed to "scale".
@you-n-g In the AAAI 2021's open-source project, they use a scaled reward. I personally think adding this parameter would be more flexible.

qlib/rl/contrib/train_onpolicy.py

qlib/rl/data/pickle_styled.py

you-n-g · 2023-01-16T01:10:11Z

qlib/rl/order_execution/reward.py

@@ -21,10 +21,13 @@ class PAPenaltyReward(Reward[SAOEState]):
    ----------
    penalty
        The penalty for large volume in a short time.
+    zoom


@lihuoran Why is zoom/ scale necessary if we have hyperparameters like learning rate?

qlib/rl/trainer/vessel.py

you-n-g · 2023-01-18T02:34:35Z

@lihuoran Could you please check the CI errors?

* Workflow runnable * CI * Slight changes to make the workflow runnable. The changes of handler/provider should be reverted before merging. * Train experiment successful * Refine handler & provider * CI issues * Resolve PR comments * Resolve PR comments * CI issues * Fix test issue * Black

lihuoran added 8 commits December 8, 2022 05:14

Workflow runnable

9fdacfc

CI

c408e62

Slight changes to make the workflow runnable. The changes of handler/…

f043883

…provider should be reverted before merging.

Train experiment successful

4405140

Merge branch 'main' into huoran/baostock_rl

65a3404

Refine handler & provider

3625bfb

Merge branch 'main' into huoran/baostock_rl

265e465

CI issues

9322214

lihuoran requested review from you-n-g and ultmaster January 11, 2023 03:42

github-actions bot added the waiting for triage Cannot auto-triage, wait for triage. label Jan 11, 2023

ultmaster reviewed Jan 12, 2023

View reviewed changes

you-n-g reviewed Jan 16, 2023

View reviewed changes

lihuoran added 2 commits January 16, 2023 05:46

Resolve PR comments

b60159a

Resolve PR comments

1d08fe9

lihuoran added 3 commits January 18, 2023 05:49

CI issues

7053e01

Fix test issue

fd08c43

Black

e66227f

lihuoran merged commit d8fc9ae into main Jan 18, 2023

you-n-g deleted the huoran/baostock_rl branch January 18, 2023 08:22

you-n-g added documentation Improvements or additions to documentation enhancement New feature or request and removed waiting for triage Cannot auto-triage, wait for triage. documentation Improvements or additions to documentation labels Jan 29, 2023

lihuoran mentioned this pull request Jan 30, 2023

Subprocess leak during RL train using shmem or subproc env type #1427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL Training pipeline on 5-min data #1415

RL Training pipeline on 5-min data #1415

lihuoran commented Jan 11, 2023

ultmaster Jan 12, 2023

lihuoran Jan 16, 2023

you-n-g Jan 18, 2023

chenditc Jan 30, 2023

ultmaster Jan 12, 2023

you-n-g Jan 16, 2023

lihuoran Jan 16, 2023

you-n-g Jan 16, 2023

you-n-g commented Jan 18, 2023

RL Training pipeline on 5-min data #1415

RL Training pipeline on 5-min data #1415

Conversation

lihuoran commented Jan 11, 2023

Description

Motivation and Context

How Has This Been Tested?

Screenshots of Test Results (if appropriate):

Types of changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

you-n-g commented Jan 18, 2023