Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Rename of SingleAgentRLModuleSpec to RLModuleSpec breaks restoring old checkpoints #47426

Open
Kakadus opened this issue Aug 30, 2024 · 4 comments · May be fixed by #47560 or #47708
Open

[RLlib] Rename of SingleAgentRLModuleSpec to RLModuleSpec breaks restoring old checkpoints #47426

Kakadus opened this issue Aug 30, 2024 · 4 comments · May be fixed by #47560 or #47708
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks rllib RLlib related issues

Comments

@Kakadus
Copy link

Kakadus commented Aug 30, 2024

What happened + What you expected to happen

I wanted to restore checkpoints created with ray v2.34.0 with ray v2.35.0, which errors with

>>> from ray.rllib.algorithms import Algorithm
>>> Algorithm.from_checkpoint(path=".../ray_results/pbt_humanoid_test/PPO_Humanoid-v4_3338d_00003_3_2024-09-09_00-46-34/checkpoint_000014")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../.cache/pypoetry/virtualenvs/ray-f4XCQ9mO-py3.12/lib/python3.12/site-packages/ray/rllib/algorithms/algorithm.py", line 399, in from_checkpoint
    state = Algorithm._checkpoint_info_to_algorithm_state(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.cache/pypoetry/virtualenvs/ray-f4XCQ9mO-py3.12/lib/python3.12/site-packages/ray/rllib/algorithms/algorithm.py", line 3442, in _checkpoint_info_to_algorithm_state
    state = pickle.load(f)
            ^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'SingleAgentRLModuleSpec' on <module 'ray.rllib.core.rl_module.rl_module' from '.../.cache/pypoetry/virtualenvs/ray-f4XCQ9mO-py3.12/lib/python3.12/site-packages/ray/rllib/core/rl_module/rl_module.py'>

I expected to be able to restore checkpoints from older ray versions after the upgrade.

Adding a line like

SingleAgentRLModuleSpec = RLModuleSpec

to ray/rllib/core/rl_module.rl_module.py allows me to continue from the old checkpoint.

This seems to be caused by #46840

Versions / Dependencies

python 3.10.12 / 3.12.5
ray v2.35.0

Reproduction script

Create a checkpoint with ray 2.34.0, upgrade to ray 2.25.0 and try to restore the checkpoint

Issue Severity

Medium: It is a significant difficulty but I can work around it.

edit: replaced the traceback with a more minimal one.

@Kakadus Kakadus added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 30, 2024
@anyscalesam anyscalesam added the rllib RLlib related issues label Sep 3, 2024
@Kakadus Kakadus linked a pull request Sep 8, 2024 that will close this issue
8 tasks
@simonsays1980 simonsays1980 self-assigned this Sep 11, 2024
@simonsays1980 simonsays1980 added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 11, 2024
@simonsays1980
Copy link
Collaborator

simonsays1980 commented Sep 11, 2024

@Kakadus Thanks for raising this issue. With which ray version has the checkpoint been trained? And did you use api_stack(enable_env_runner_and_connector_v2=True, enable_rl_module_and_learner=True)?

@Kakadus
Copy link
Author

Kakadus commented Sep 13, 2024

@Kakadus Thanks for raising this issue. With which ray version has the checkpoint been trained?

The checkpoint is created with v2.34.0. The error happens when restoring with v2.35.0

And did you use api_stack(enable_env_runner_and_connector_v2=True, enable_rl_module_and_learner=True)?

no, not intentionally at least. I reproduced the error and tested with the first best example I found:

Run this with ray v2.34.0

#!/usr/bin/env python
"""Example of using PBT with RLlib.

Note that this requires a cluster with at least 8 GPUs in order for all trials
to run concurrently, otherwise PBT will round-robin train the trials which
is less efficient (or you can set {"gpu": 0} to use CPUs for SGD instead).

Note that Tune in general does not need 8 GPUs, and this is just a more
computationally demanding example.
"""

import random

from ray import train, tune
from ray.rllib.algorithms.ppo import PPO
from ray.tune import TuneConfig
from ray.tune.schedulers import PopulationBasedTraining

if __name__ == "__main__":
    # Postprocess the perturbed config to ensure it's still valid
    def explore(config):
        # ensure we collect enough timesteps to do sgd
        if config["train_batch_size"] < config["sgd_minibatch_size"] * 2:
            config["train_batch_size"] = config["sgd_minibatch_size"] * 2
        # ensure we run at least one sgd iter
        if config["num_sgd_iter"] < 1:
            config["num_sgd_iter"] = 1
        return config


    pbt = PopulationBasedTraining(
        time_attr="time_total_s",
        perturbation_interval=120,
        resample_probability=0.25,
        # Specifies the mutations of these hyperparams
        hyperparam_mutations={
            "lambda": lambda: random.uniform(0.9, 1.0),
            "clip_param": lambda: random.uniform(0.01, 0.5),
            "lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
            "num_sgd_iter": lambda: random.randint(1, 30),
            "sgd_minibatch_size": lambda: random.randint(128, 16384),
            "train_batch_size": lambda: random.randint(2000, 160000),
        },
        custom_explore_fn=explore,
    )

    tuner = tune.Tuner(
        PPO,
        run_config=train.RunConfig(
            name="pbt_humanoid_test",
            checkpoint_config=train.CheckpointConfig(checkpoint_frequency=1),
        ),
        tune_config=TuneConfig(
            scheduler=pbt,
            num_samples=8,
            metric="env_runners/episode_reward_mean",
            mode="max",
            reuse_actors=True,
        ),
        param_space={
            "env": "Humanoid-v4",
            "kl_coeff": 1.0,
            "num_workers": 1,
            "num_gpus": 0,
            "model": {"free_log_std": True},
            # These params are tuned from a fixed starting value.
            "lambda": 0.95,
            "clip_param": 0.2,
            "lr": 1e-4,
            # These params start off randomly drawn from a set.
            "num_sgd_iter": 10,
            "sgd_minibatch_size": 128,
            "train_batch_size": 256,
        },
    )
    results = tuner.fit()

    print("best hyperparameters: ", results.get_best_result().config)

And restore a checkpoint with 3.35.0

#!/usr/bin/env python
from ray.rllib.algorithms import Algorithm


Algorithm.from_checkpoint(path=".../ray_results/pbt_humanoid_test/PPO_Humanoid-v4_3338d_00003_3_2024-09-09_00-46-34/checkpoint_000014")

@simonsays1980
Copy link
Collaborator

simonsays1980 commented Sep 16, 2024

@Kakadus thanks for raising this issue. We overhauled the checkpointing in newer versions to give it a higher flexibility.

You could use copyreg and a dynamic binding to influence how pickle loads the SingleAgentRLModuleSpec:

import copyreg
import pickle
import ray.rllib.core.rl_module.rl_module as rl_module
from rl_module import RLModuleSpec


def single_agent_rl_module_spec_constructor(*args, **kwargs):
    """Constructor to replace SingleAgentRLModuleSpec with RLModuleSpec."""
    return RLModuleSpec(*args, **kwargs)

# Dynamically alias the old class name to the new one
rl_module.SingleAgentRLModuleSpec = RLModuleSpec

# Tell `pickle` how to handle old `SingleAgentRLModuleSpec` instances.
copyreg.pickle(
    ("ray.rllib.core.rl_module", "SingleAgentRLModuleSpec"),
    single_agent_rl_module_spec_constructor,
)

# Try loading the checkpoint.
Algorithm.from_checkpoint(...)

@simonsays1980 simonsays1980 linked a pull request Sep 17, 2024 that will close this issue
8 tasks
@Kakadus
Copy link
Author

Kakadus commented Sep 17, 2024

Thanks @simonsays1980

If I understand correctly, #47708 will prevent this type of error from happening in the future; making the created checkpoints more backward compatible, while this error has to be worked around. Would it make sense to merge #47560 then to have at least one release which is able to restore older checkpoints?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks rllib RLlib related issues
Projects
None yet
3 participants