Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Bianca Marin Moreno; Margaux Brégère; Pierre Gaillard; Nadia Oudjane

Pré-Publication, Document De Travail Année : 2023

Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

(1) , (2, 3, 4) , (1) , (2)

1
2
3
4

Bianca Marin Moreno

Fonction : Auteur

Apprentissage de modèles à partir de données massives

Margaux Brégère

Fonction : Auteur

EDF R&D

Laboratoire de Probabilités, Statistique et Modélisation

Sorbonne Université

Pierre Gaillard

Fonction : Auteur
PersonId : 13025
IdHAL : pierre-gaillard
ORCID : 0000-0002-5665-7904
IdRef : 19041992X

Apprentissage de modèles à partir de données massives

Nadia Oudjane

Fonction : Auteur

EDF R&D

Résumé

Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.

Mots clés

model-based reinforcement learning mirror descent online learning convex optimisation mean field control concave utility reinforcement learning

Domaines

Machine Learning [stat.ML] Optimisation et contrôle [math.OC] Analyse de données, Statistiques et Probabilités [physics.data-an]

Fichier principal

rpmzfspfrtqryyrvjgxvzxbdhmhhndbb.pdf (1.12 Mo)

my_paper.pdf (1.03 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Bianca Marin Moreno : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04302000

Soumis le : jeudi 23 novembre 2023-11:15:22

Dernière modification le : samedi 27 avril 2024-03:14:51

Dates et versions

hal-04302000 , version 1 (23-11-2023)

Licence

Paternité

Identifiants

HAL Id : hal-04302000 , version 1
ARXIV : 2311.18346

Citer

Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane. Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent. 2023. ⟨hal-04302000⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA INSMI LJK LJK_GI INRIA2 TDS-MACS LJK-GI-THOTH EDF LPSM SORBONNE-UNIVERSITE SU-SCIENCES UP-SCIENCES

69 Consultations

50 Téléchargements

Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager