Reinforcement learning
-
Course outline
Nathanaël FijalkowWe present the outline of a course on reinforcement learning.The reference book and key inspiration for this course and for reinforcement learning in general is the book by Sutton and Barto (it can be downloaded for free at that address).The reaso...
-
The upper confidence bound (UCB) analysis for multi-armed bandits
Nathanaël FijalkowWe present a proof that the upper confidence bound yields an (asymptotically) optimal algorithm for regret minimisation of multi-armed bandits.There is an accompanying github repository for experimenting with different algorithms for the multi-arm...
-
Monte Carlo Tree Search
Nathanaël FijalkowWe present the Monte Carlo tree search algorithm for finding an optimal strategy in a two-player game.There is an accompanying github repository for implementations of some algorithms solving Tic-Tac-Toe.The setting we consider is two-player (call...
-
Part 2, Markov decision processes and dynamic algorithms
Nathanaël FijalkowWe present the model of MDPS, some fundamental properties and the two dynamic algorithms, value iteration and policy iteration.A Markov decision process is given by a set of states $S$, a set of actions $A$, and a transition function\[\Delta : S \...
-
Part 3, Monte Carlo approaches, temporal differences, and off-policy learning
Nathanaël FijalkowWe consider the setting where the MDP is only known through simulation and show how to adapt the previous algorithms using statistics instead of exact computations.The basic notations are given in the course outline.The first part about dynamic al...