Skip to content

MPOPIS Details

Dylan Asmar edited this page Sep 21, 2022 · 6 revisions

Welcome to the MPOPIS wiki!

MPOPIS Simulation Details and Algorithm Parameter Settings

This wiki page contains simulation details and parameters used for the algorithms.

Simulation Details

MountainCar Environment

The goal of the MountainCar problem is to get an under-powered car up a hill to a goal location. The action space consisted of a single continuous action at each time step. The MountainCar environment used was the ReinforcementLearning.jl environment, which was based on OpenAI Gym's MountainCar scenario.

The reward function for this problem was modified to add an incentive to go faster and an indicator variable for reaching the goal location with a positive velocity. The environment terminated when the car reached the goal location or at 200 steps.

# Modified MountainCar reward function
function RLBase.reward(env::MountainCarEnv{A,T}) where {A,T} 
    rew = 0.0
    if env.state[1] >= env.params.goal_pos && 
        env.state[2] >= env.params.goal_velocity
        rew += 100000
    end
    rew += abs(env.state[2])
    rew += env.done ? 0.0 : -1.0
    return rew
end

Car Racing Environment

The car racing environment has the option to be run on multiple tracks. The default track and the one used in the simulations is a 1.18 km track with a lane width of 15 m. The track is shown below. All scenarios started with car 1 at the origin with other cars offset left and right by 5 m. Each car was oriented toward the positive y-axis and had a longitudinal velocity of 10 m/s at t=0. The parameters used for the car model and the dynamics can be seen in the code here. The model parameters and dynamics were implemented from Brown and Gerdes (2020) and Subosits and Gerdes (2021).

Algorithm Details

MountainCar Environment

Parameters :mppi :PMCMPPI :μaismppi :μΣaismppi :cemppi :cmamppi
Samples 20-180 (multiples of 20) 20 20 20 20 20
Horizon 15 15 15 15 15 15
Inverse Temp (λ) 0.1 0.1 0.1 0.1 0.1 0.1
Control Cost Param (α) 1.0 1.0 1.0 1.0 1.0 1.0
Init Control Sequence 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0
Control Covariance 1.5 1.5 1.5 1.5 1.5 1.5
AIS Iterations --- 1-8 1-8 1-8 1-8 1-8
AIS Inv Temp (λ_ais) --- 0.1 0.1 0.1 --- ---
CE Elite Threshold --- --- --- --- 0.8 ---
CE Σ Estimation Method --- --- --- --- :mle ---
CMA Step FActor (σ) --- --- --- --- --- 0.5
CMA Elite Threshold --- --- --- --- --- 0.8

Car Racing Environment (1 Car)

Parameters :mppi :PMCMPPI :μaismppi :μΣaismppi :cemppi :cmamppi
Samples 375-2250 (multiples of 375) 375 375 375 375 375
Horizon 50 50 50 50 50 50
Inverse Temp (λ) 10.0 10.0 10.0 10.0 10.0 10.0
Control Cost Param (α) 1.0 1.0 1.0 1.0 1.0 1.0
Init Control Sequence 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0
Control Covariance [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1]
AIS Iterations --- 1-6 1-6 1-6 1-6 1-6
AIS Inv Temp (λ_ais) --- 20.0 20.0 20.0 --- ---
CE Elite Threshold --- --- --- --- 0.8 ---
CE Σ Estimation Method --- --- --- --- :mle ---
CMA Step FActor (σ) --- --- --- --- --- 0.5
CMA Elite Threshold --- --- --- --- --- 0.8

Car Racing Environment (2 Cars)

Parameters :mppi :PMCMPPI :μaismppi :μΣaismppi :cemppi :cmamppi
Samples 375-2250 (multiples of 375) 375 375 375 375 375
Horizon 50 50 50 50 50 50
Inverse Temp (λ) 10.0 10.0 10.0 10.0 10.0 10.0
Control Cost Param (α) 1.0 1.0 1.0 1.0 1.0 1.0
Init Control Sequence 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0
Control Covariance [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1]
AIS Iterations --- 1-6 1-6 1-6 1-6 1-6
AIS Inv Temp (λ_ais) --- 70.0 70.0 70.0 --- ---
CE Elite Threshold --- --- --- --- 0.8 ---
CE Σ Estimation Method --- --- --- --- :mle ---
CMA Step FActor (σ) --- --- --- --- --- 0.5
CMA Elite Threshold --- --- --- --- --- 0.8

Car Racing Environment (3+ Cars)

Parameters :mppi :PMCMPPI :μaismppi :μΣaismppi :cemppi :cmamppi
Samples 375-2250 (multiples of 375) 375 375 375 150 150
Horizon 50 50 50 50 50 50
Inverse Temp (λ) 10.0 10.0 10.0 10.0 10.0 10.0
Control Cost Param (α) 1.0 1.0 1.0 1.0 1.0 1.0
Init Control Sequence 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0 0.0, ..., 0.0
Control Covariance [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1] [0.0625 0; 0 0.1]
AIS Iterations --- 1-6 1-6 1-6 1, 5, 7, ..., 15 1, 5, 7, ..., 15
AIS Inv Temp (λ_ais) --- 70.0 70.0 70.0 --- ---
CE Elite Threshold --- --- --- --- 0.8 ---
CE Σ Estimation Method --- --- --- --- :ss ---
CMA Step FActor (σ) --- --- --- --- --- 0.5
CMA Elite Threshold --- --- --- --- --- 0.8

HalfCheetah-v4 and Ant-v4

The numbers presented in the paper used terminate_when_unhealthy=False for Ant-v4.

Parameters :mppi :cemppi
Samples 250, 500, 1000, 1500, 3000 50, 100, 125, 150, 200
Horizon 50 50
Inverse Temp (λ) 1.0 1.0
Control Cost Param (α) 1.0 1.0
Init Control Sequence 0.0, ..., 0.0 0.0, ..., 0.0
Control Covariance I(6) * 0.25 I(6) * 0.25
AIS Iterations --- 5, 5, 8, 10, 15
CE Elite Threshold --- 0.8
CE Σ Estimation Method --- :ss

Covariance Estimation

For :PMCMPPI, :μaismppi, and :μΣaismppi, the covariance was estimated with the maximum likelihood estimator (:mle). The CE method used the :mle method and :ss method which is a the Schaffer & Strimmer shrinkage estimator. Different covariance estimation techniques were implemented through the integration of CovarianceEstimation.jl. The methods tested were

Sample Size and Covariance Estimation Comparison

For a given effective sample size, the CE and CMA methods benefited from using fewer samples with more iterations as the number of cars increased. However, as the sample size decreased, the method to approximate the covariance matrix in the CE version of MPOPI became more important. Below are two gifs of the CMA version of MPOPI. One is with 4 iterations of 375 samples using the :mle and the other is with 10 iterations of 150 samples using :ss to estimate the covariance matrix.

MPOPI CMA - 375 Samples, 4 Iterations

MPOPI CMA - 150 Samples, 10 Iterations