Reinforcement learning

Course outline
Nathanaël Fijalkow
We present the outline of a course on reinforcement learning.The reference book and key inspiration for this course and for reinforcement learning in general is the book by Sutton and Barto (it can be downloaded for free at that address).The reaso...
The upper confidence bound (UCB) analysis for multi-armed bandits
Nathanaël Fijalkow
We present a proof that the upper confidence bound yields an (asymptotically) optimal algorithm for regret minimisation of multi-armed bandits.There is an accompanying github repository for experimenting with different algorithms for the multi-arm...
Monte Carlo Tree Search
Nathanaël Fijalkow
We present the Monte Carlo tree search algorithm for finding an optimal strategy in a two-player game.There is an accompanying github repository for implementations of some algorithms solving Tic-Tac-Toe.The setting we consider is two-player (call...
Part 2, Markov decision processes and dynamic algorithms
Nathanaël Fijalkow
We present the model of MDPS, some fundamental properties and the two dynamic algorithms, value iteration and policy iteration.A Markov decision process is given by a set of states $S$, a set of actions $A$, and a transition function\[\Delta : S \...
Part 3, Monte Carlo approaches, temporal differences, and off-policy learning
Nathanaël Fijalkow
We consider the setting where the MDP is only known through simulation and show how to adapt the previous algorithms using statistics instead of exact computations.The basic notations are given in the course outline.The first part about dynamic al...

Games automata play

Reinforcement learning

Course outline

The upper confidence bound (UCB) analysis for multi-armed bandits

Monte Carlo Tree Search

Part 2, Markov decision processes and dynamic algorithms

Part 3, Monte Carlo approaches, temporal differences, and off-policy learning