Vihang P. Patil1, Markus Hofmarcher1, Markus-Constantin Dinu1, Matthias Dorfer3, Patrick M. Blies3, Johannes Brandstetter1, Jose A. Arjona-Medina1, Sepp Hochreiter1, 2
1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
2 Institute of Advanced Research in Artificial Intelligence (IARAI)
3 enliteAI, Vienna, Austria
Detailed blog post on this paper at this link and a video showcasing the MineCraft agent at this link.
The full paper is available at https://arxiv.org/abs/2009.14108
This package contains an implementation of Align-RUDDER together with code to reproduce the results of artificial tasks I & II as stated in the paper. For the sake of time the default settings include only 10 seeds per experiment instead of the 100 used for the results in the paper.
To reproduce all results we provide an environment.yml file to setup a conda environment with the required packages. Run the following command to create the environment:
conda env create --file environment.yml
conda activate align-rudder
pip install -e .
To recreate the results from the paper you can run the included run scripts for the FourRooms and EightRooms environments and the respective method.
Align-RUDDER
python align_rudder/run_four_alignrudder.py
python align_rudder/run_eight_alignrudder.py
Behavioral Cloning + Q-Learning
python align_rudder/run_four_bc.py
python align_rudder/run_eight_bc.py
DQFD (Deep Q-Learning from Demonstrations)
python align_rudder/run_four_dqfd.py
python align_rudder/run_eight_dqfd.py
RUDDER (LSTM)
python align_rudder/run_four_rudder_lstm.py
python align_rudder/run_eight_rudder_lstm.py
Once you ran all experiments you are interested in you can run the following script to get a summary of the results. By default plots for all available environments will be generated.
python align_rudder/plot_results.py [--env "FourRooms"|"EightRooms"|"all"]
MIT LICENSE