| ✅ Proximal Policy Gradient (PPO) |
ppo.py, docs |
|
ppo_atari.py, docs |
|
ppo_continuous_action.py, docs |
|
ppo_atari_lstm.py |
|
ppo_procgen.py |
| ✅ Deep Q-Learning (DQN) |
dqn.py |
|
dqn_atari.py |
| ✅ Categorical DQN (C51) |
c51.py |
|
c51_atari.py |
| ✅ Apex Deep Q-Learning (Apex-DQN) |
apex_dqn_atari.py |
| ✅ Soft Actor-Critic (SAC) |
sac_continuous_action.py |
| ✅ Deep Deterministic Policy Gradient (DDPG) |
ddpg_continuous_action.py |
| ✅ Twin Delayed Deep Deterministic Policy Gradient (TD3) |
td3_continuous_action.py |