[RLlib] Increase backward compatibility of checkpoints. #47708

simonsays1980 · 2024-09-17T15:41:24Z

Why are these changes needed?

The AlgorithmConfig is still a pain point when trying to load older checkpoints (of the new stack specifically). The reason for this are usually attributes that were added between storing the checkpoint and loading it again (lately e.g. the _torch_grad_scaler_class attribute). This PR suggests a logic that enables loading older checkpoints with a newer version (of the new stack).

The AlgorithmConfig class is provided by set_state and from_state to receive a dictionary state and initiate a config from it.
The Algorithm does always create its config by calling the from_state method and thereby adds all attributes of the present AlgorithmConfig class version to it.

Related issue number

Closes #47426

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…nitialization of 'Algorithm' to always initialize the config from state. Furthermore, added getter and setter to 'PolicySpec'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…thm-config

sven1977 · 2024-09-18T09:56:01Z

rllib/algorithms/algorithm.py

@@ -455,7 +455,7 @@ def __init__(
                object. If unspecified, a default logger is created.
            **kwargs: Arguments passed to the Trainable base class.
        """
-        config = config or self.get_default_config()
+        config = config  # or self.get_default_config()


nit: why remove the default config path?

I see. Yes, this is impüortant when a user does not pass in any config. I will change this. Good catch!

sven1977 · 2024-09-18T09:56:21Z

rllib/algorithms/algorithm.py

-                config = AlgorithmConfig.from_dict(
-                    config_dict=self.merge_algorithm_configs(
-                        default_config, config, True
+                if "class" in config:


remove this entire if-block: There are no more algos left that return a dict from their get_default_config() method.

sven1977 · 2024-09-18T09:56:48Z

rllib/algorithms/algorithm.py

            # Default config is an AlgorithmConfig -> update its properties
            # from the given config dict.
            else:
-                config = default_config.update_from_dict(config)
+                if isinstance(config, dict) and "class" in config:


Actually, let's do the following with this PR:

Make AlgorithmConfig a Checkpointable and override the get/set_state and get_ctor_args_and_kwargs methods first:

def get_state(self, *, components=...): return [basically the AlgoConfig as a dict] def set_state(self, state): # Here, we can conveniently check, whether keys in `state` exist as properties and override the correct properties as wanted. def get_ctor_args_and_kwargs(self): return () # <- empty tuple AlgorithmConfigs are always constructed w/o any args/kwargs

Then we can do this here:

if isinstance(config, dict): config = default_config config.set_state(config) # else: # keep config as is (it's already a proper AlgorithmConfig object)

So, basically, we from here on treat user-provided config dicts as "AlgorithmConfig state dicts".

sven1977 · 2024-09-18T10:00:53Z

rllib/algorithms/algorithm_config.py

@@ -781,6 +776,53 @@ def update_from_dict(

        return self

+    def get_state(self) -> Dict[str, Any]:


Add @override(Checkpointable) and subclass AlgorithmConfig from Checkpointable.

Yeah. This looks sound now. I am so glad when the old stack is gone. Everything will become so clean then.

sven1977 · 2024-09-18T10:01:21Z

rllib/algorithms/algorithm_config.py

+
+        return state
+
+    @classmethod


Add @override(Checkpointable) and subclass AlgorithmConfig from Checkpointable.

Rename this method to set_state() and make it not a class method.

sven1977 · 2024-09-18T10:01:37Z

rllib/policy/policy.py

@@ -124,6 +124,23 @@ def __eq__(self, other: "PolicySpec"):
            and self.config == other.config
        )

+    def get_state(self) -> Dict[str, Any]:


this is ok. Leave as-is :)

sven1977 · 2024-09-18T10:02:50Z

rllib/algorithms/algorithm.py

+                if isinstance(config, dict) and "class" in config:
+                    config = default_config.from_state(config)
+                else:
+                    config = default_config.update_from_dict(config)
        else:
            default_config = self.get_default_config()


Ah, I see. This answers my question above: We already do this get_default_dict() call here. Ok.

Haha, now I haven't overviewed the whole logic myself. So, yes, here we set the default, if nothing else is provided.

sven1977 · 2024-09-18T10:04:56Z

rllib/algorithms/algorithm.py

            default_config = self.get_default_config()
            # Given AlgorithmConfig is not of the same type as the default config:
            # This could be the case e.g. if the user is building an algo from a
            # generic AlgorithmConfig() object.
            if not isinstance(config, type(default_config)):
                config = default_config.update_from_dict(config.to_dict())
+            else:
+                config = default_config.from_state(config.get_state())


Suggested change

default_config = self.get_default_config()

# Given AlgorithmConfig is not of the same type as the default config:

# This could be the case e.g. if the user is building an algo from a

# generic AlgorithmConfig() object.

if not isinstance(config, type(default_config)):

config = default_config.update_from_dict(config.to_dict())

else:

config = default_config.from_state(config.get_state())

default_config = self.get_default_config()

config_state = config.get_state()

config = default_config

config.set_state(config_state)

I think we can simplify here to the above suggestion ^

sven1977 · 2024-09-18T10:05:22Z

rllib/algorithms/algorithm.py

@@ -2899,7 +2908,7 @@ def get_checkpointable_components(self) -> List[Tuple[str, "Checkpointable"]]:
    @override(Checkpointable)
    def get_ctor_args_and_kwargs(self) -> Tuple[Tuple, Dict[str, Any]]:
        return (
-            (self.config,),  # *args,
+            (self.config.get_state(),),  # *args,


Btw, we should probably do the same everywhere else config is part of the c'tor args/kwargs: Learner, LearnerGroup, and EnvRunner.

Then, we also have to make sure their c'tors also accept config states(!), not just AlgorithmConfig objects (the same as how Algorithm does it now).

sven1977

I really like this PR. It solves so many problems at once!

A few design change requests and nits, but overall already in very good shape! Thanks @simonsays1980 .

sven1977 · 2024-09-18T10:39:38Z

Oh, sorry, another thing. Can we also get rid of (rename):

AlgorithmConfig.to_dict() -> get_state()
AlgorithmConfig.update_from_dict() -> set_state()

Does this make sense? ^

Added 'get_state' and 'set_state' to 'AlgorithmConfig' and modified i…

6fd45a5

…nitialization of 'Algorithm' to always initialize the config from state. Furthermore, added getter and setter to 'PolicySpec'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

simonsays1980 added rllib RLlib related issues rllib-checkpointing-or-recovery An issue related to checkpointing/recovering RLlib Trainers. labels Sep 17, 2024

Merge branch 'master' into increase-backwards-compatibility-of-algori…

10e9bfb

…thm-config

simonsays1980 marked this pull request as ready for review September 17, 2024 15:42

simonsays1980 requested review from sven1977 and ArturNiederfahrenhorst as code owners September 17, 2024 15:42

Kakadus mentioned this pull request Sep 17, 2024

[RLlib] Rename of SingleAgentRLModuleSpec to RLModuleSpec breaks restoring old checkpoints #47426

Open

sven1977 reviewed Sep 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Increase backward compatibility of checkpoints. #47708

[RLlib] Increase backward compatibility of checkpoints. #47708

simonsays1980 commented Sep 17, 2024 •

edited

Loading

sven1977 Sep 18, 2024

simonsays1980 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 Sep 18, 2024

simonsays1980 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 Sep 18, 2024

simonsays1980 Sep 18, 2024

sven1977 Sep 18, 2024

simonsays1980 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 Sep 18, 2024

sven1977 left a comment

sven1977 commented Sep 18, 2024

		@@ -781,6 +776,53 @@ def update_from_dict(

		return self

		def get_state(self) -> Dict[str, Any]:

[RLlib] Increase backward compatibility of checkpoints. #47708

Are you sure you want to change the base?

[RLlib] Increase backward compatibility of checkpoints. #47708

Conversation

simonsays1980 commented Sep 17, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

sven1977 commented Sep 18, 2024

simonsays1980 commented Sep 17, 2024 •

edited

Loading