Independent PA-DDPG, i.e., there is no centralized critic or communication. Each agent acts asynchronously. Multi-agent scenarios can be implemented using redis.
+1 if goal, else 0 (sparse reward)
Similar to MAPQN, there are 3 mid-level parameterized actions (kick to, move to, dribble to) and a discrete high-level action (shoot) for offense players to choose.
Low level features in HFO.
- Evaluation frequency: every 500 episodes
- Evaluation length: 1k episodes
1v1 against the world champion HELIOS
Use command --help
for more parameter settings.
You can adjust the number of players in training and evaluation accordingly (e.g. for self-play).
python connect.py --offense-agents 2 --defense-agents 0 --defense-npcs 1 --server-port 6000
redis-server
Start a learner for an agent:
python learner.py --tensorboard-dir agent1 --save-dir agent1
Start an evaluator for an agent:
python evaluator.py --tensorboard-dir agent1 --save-dir agent1 --episodes 20000
PDDPG 2v2 training:
python connect.py --offense-agents 2 --defense-agents 0 --defense-npcs 2 --server-port 6000
python learner.py --tensorboard-dir agent1 --save-dir agent1
python learner.py --tensorboard-dir agent2 --save-dir agent2
Evaluate PDDPG 2v2 model:
python connect.py --offense-agents 2 --defense-agents 0 --defense-npcs 2 --server-port 6000
python evaluator.py --tensorboard-dir agent1 --save-dir agent1
python evaluator.py --tensorboard-dir agent2 --save-dir agent2
If this repo helps you, please star it.
The code in this repo has refered to HFO, MP-DQN, PA-DDPG, gym-soccer.