# Reinforcement learning

• ### Course outline

Nathanaël Fijalkow

We present the outline of a course on reinforcement learning.The reference book and key inspiration for this course and for reinforcement learning in general is the book by Sutton and Barto (it can be downloaded for free at that address).The reaso...

• ### The upper confidence bound (UCB) analysis for multi-armed bandits

Nathanaël Fijalkow

We present a proof that the upper confidence bound yields an (asymptotically) optimal algorithm for regret minimisation of multi-armed bandits.There is an accompanying github repository for experimenting with different algorithms for the multi-arm...

• ### Monte Carlo Tree Search

Nathanaël Fijalkow

We present the Monte Carlo tree search algorithm for finding an optimal strategy in a two-player game.There is an accompanying github repository for implementations of some algorithms solving Tic-Tac-Toe.The setting we consider is two-player (call...

• ### Part 2, Markov decision processes and dynamic algorithms

Nathanaël Fijalkow

We present the model of MDPS, some fundamental properties and the two dynamic algorithms, value iteration and policy iteration.A Markov decision process is given by a set of states $S$, a set of actions $A$, and a transition function\[\Delta : S \...

• ### Part 3, Monte Carlo approaches, temporal differences, and off-policy learning

Nathanaël Fijalkow

We consider the setting where the MDP is only known through simulation and show how to adapt the previous algorithms using statistics instead of exact computations.The basic notations are given in the course outline.The first part about dynamic al...