-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Rename of SingleAgentRLModuleSpec
to RLModuleSpec
breaks restoring old checkpoints
#47426
Comments
@Kakadus Thanks for raising this issue. With which ray version has the checkpoint been trained? And did you use |
The checkpoint is created with v2.34.0. The error happens when restoring with v2.35.0
no, not intentionally at least. I reproduced the error and tested with the first best example I found: Run this with ray v2.34.0 #!/usr/bin/env python
"""Example of using PBT with RLlib.
Note that this requires a cluster with at least 8 GPUs in order for all trials
to run concurrently, otherwise PBT will round-robin train the trials which
is less efficient (or you can set {"gpu": 0} to use CPUs for SGD instead).
Note that Tune in general does not need 8 GPUs, and this is just a more
computationally demanding example.
"""
import random
from ray import train, tune
from ray.rllib.algorithms.ppo import PPO
from ray.tune import TuneConfig
from ray.tune.schedulers import PopulationBasedTraining
if __name__ == "__main__":
# Postprocess the perturbed config to ensure it's still valid
def explore(config):
# ensure we collect enough timesteps to do sgd
if config["train_batch_size"] < config["sgd_minibatch_size"] * 2:
config["train_batch_size"] = config["sgd_minibatch_size"] * 2
# ensure we run at least one sgd iter
if config["num_sgd_iter"] < 1:
config["num_sgd_iter"] = 1
return config
pbt = PopulationBasedTraining(
time_attr="time_total_s",
perturbation_interval=120,
resample_probability=0.25,
# Specifies the mutations of these hyperparams
hyperparam_mutations={
"lambda": lambda: random.uniform(0.9, 1.0),
"clip_param": lambda: random.uniform(0.01, 0.5),
"lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
"num_sgd_iter": lambda: random.randint(1, 30),
"sgd_minibatch_size": lambda: random.randint(128, 16384),
"train_batch_size": lambda: random.randint(2000, 160000),
},
custom_explore_fn=explore,
)
tuner = tune.Tuner(
PPO,
run_config=train.RunConfig(
name="pbt_humanoid_test",
checkpoint_config=train.CheckpointConfig(checkpoint_frequency=1),
),
tune_config=TuneConfig(
scheduler=pbt,
num_samples=8,
metric="env_runners/episode_reward_mean",
mode="max",
reuse_actors=True,
),
param_space={
"env": "Humanoid-v4",
"kl_coeff": 1.0,
"num_workers": 1,
"num_gpus": 0,
"model": {"free_log_std": True},
# These params are tuned from a fixed starting value.
"lambda": 0.95,
"clip_param": 0.2,
"lr": 1e-4,
# These params start off randomly drawn from a set.
"num_sgd_iter": 10,
"sgd_minibatch_size": 128,
"train_batch_size": 256,
},
)
results = tuner.fit()
print("best hyperparameters: ", results.get_best_result().config) And restore a checkpoint with 3.35.0 #!/usr/bin/env python
from ray.rllib.algorithms import Algorithm
Algorithm.from_checkpoint(path=".../ray_results/pbt_humanoid_test/PPO_Humanoid-v4_3338d_00003_3_2024-09-09_00-46-34/checkpoint_000014") |
@Kakadus thanks for raising this issue. We overhauled the checkpointing in newer versions to give it a higher flexibility. You could use import copyreg
import pickle
import ray.rllib.core.rl_module.rl_module as rl_module
from rl_module import RLModuleSpec
def single_agent_rl_module_spec_constructor(*args, **kwargs):
"""Constructor to replace SingleAgentRLModuleSpec with RLModuleSpec."""
return RLModuleSpec(*args, **kwargs)
# Dynamically alias the old class name to the new one
rl_module.SingleAgentRLModuleSpec = RLModuleSpec
# Tell `pickle` how to handle old `SingleAgentRLModuleSpec` instances.
copyreg.pickle(
("ray.rllib.core.rl_module", "SingleAgentRLModuleSpec"),
single_agent_rl_module_spec_constructor,
)
# Try loading the checkpoint.
Algorithm.from_checkpoint(...) |
Thanks @simonsays1980 If I understand correctly, #47708 will prevent this type of error from happening in the future; making the created checkpoints more backward compatible, while this error has to be worked around. Would it make sense to merge #47560 then to have at least one release which is able to restore older checkpoints? |
What happened + What you expected to happen
I wanted to restore checkpoints created with ray v2.34.0 with ray v2.35.0, which errors with
I expected to be able to restore checkpoints from older ray versions after the upgrade.
Adding a line like
to
ray/rllib/core/rl_module.rl_module.py
allows me to continue from the old checkpoint.This seems to be caused by #46840
Versions / Dependencies
python 3.10.12 / 3.12.5
ray v2.35.0
Reproduction script
Create a checkpoint with ray 2.34.0, upgrade to ray 2.25.0 and try to restore the checkpoint
Issue Severity
Medium: It is a significant difficulty but I can work around it.
edit: replaced the traceback with a more minimal one.
The text was updated successfully, but these errors were encountered: