Reinforcement learning framework
Tabular methods
- Dynamic programming
- Monte Carlo methods
- Temporal-difference methods
- N-step bootstrapping
Functional Approximation
Deep Q networks
Policy gradient methods
- REINFORCE
- REINFORCE with baseline
- Actor-critic
- Trust Region Policy Optimization
- Proximal Policy Optimization
Continuous action domain
- Deep Deterministic policy gradient
- Twin Delayed Deep Deterministic policy gradient
Monte Carlo tree search
- AlphaZero architecture
Model-based algorithms
- MCTS with a learned model
Partially observable environments
Discrete variable optimization
In recent years, reinforcement learning has been combined with deep neural networks, giving rise to agents with super-human performance (for example for Chess, Go, Dota2, or StarcraftII, capable of being trained solely by self-play), datacenter cooling algorithms being 50% more efficient than trained human operators, or improved machine translation. The goal of the course is to introduce reinforcement learning employing deep neural networks, focusing both on the theory and on practical implementations.