# Séminaire de Mécanique d'Orsay

## Le Jeudi 21 janvier 2021 à 14h00 - Webinaire - le lien sera communiqué ultérieurement

### Deep Reinforcement Learning for Dynamical Systems

## M. Alessandro Bucci

Laboratoire de Recherche en Informatique
Equipe A&O

In the past few years, machine learning techniques have been successfully applied in the decision making process by becoming a keystone for tackling problems where an analytical model does not exist or is computationally costly to solve. Self driving cars or human level players in Go game are just two glaring examples of this; for these examples, and differently from classification or regression tasks, machine learning models are trained by reinforcement learning (RL) procedures which are conceived to maximize a desired cost function subject to a nonlinear system. However, one can wonder how reinforcement learning is related to linear optimal control? Why is reinforcement learning suitable in non-linear frameworks?
In this seminar, we will address those questions by using examples of interest in fluid mechanics, where linear optimal controllers have been successfully implemented in the last decade, but RL can be of great interest in tackling non-convex control problems associated with nonlinear dynamics. First, we describe the RL strategy starting from the optimal control framework and showing how it can be understood as a full data-driven counterpart of nonlinear optimal control, of which the solution is often impractical. Secondly, by using an algorithm referred to as Deep Deterministic Policy Gradient, we show how Neural Networks can be used in combination with the RL framework giving rise to the so-called Deep RL (DRL), and how DRL can be applied for the control Kuramoto-Sivashinsky chaotic behaviour by stabilizing the system in the vicinity of its unstable solutions.
Finally, an analysis will be provided to demonstrate the robustness of the achieved control law with respect i) to the initial conditions and ii) to the learning procedure used to optimize the parameters of the models. Special emphasis will be given to the hypothesis required by the different RL algorithms (e.g Markovianity for off-policy iteration, convex cost function for on-policy iteration, etc.) shedding lights on its strengths and weaknesses. Guidelines to improve exploration stage based on network theory and open issues (time delays, slow exploration, partial observability, etc.) on the application of RL in fluid dynamics will be discussed at the end of the presentation.

Accès Webinaire - le lien sera communiqué ultérieurement