Reinforcement learning framework

Tabular methods

- Dynamic programming

- Monte Carlo methods

- Temporal-difference methods

- N-step bootstrapping

Functional Approximation

Deep Q networks

Policy gradient methods

- REINFORCE

- REINFORCE with baseline

- Actor-critic

- Trust Region Policy Optimization

- Proximal Policy Optimization

Continuous action domain

- Deep Deterministic policy gradient

- Twin Delayed Deep Deterministic policy gradient

Monte Carlo tree search

- AlphaZero architecture

Model-based algorithms

- MCTS with a learned model

Partially observable environments

Discrete variable optimization

In recent years, reinforcement learning has been combined with deep neural networks, giving rise to agents with super-human performance (for example for Chess, Go, Dota2, or StarcraftII, capable of being trained solely by self-play), datacenter cooling algorithms being 50% more efficient than trained human operators, or improved machine translation. The goal of the course is to introduce reinforcement learning employing deep neural networks, focusing both on the theory and on practical implementations.