Imitation and Reinforcement Learning

Motivation  Many clinical treatment guidelines consist of rigid protocols that cannot account for the heterogeneity of conditions in hospital settings. In contrast, the medical community shows a growing research interest in personalised treatment strategies, which must be developed with powerful modelling tools. Large observational clinical datasets provide insight into treatment and diagnostic decisions taken by healthcare professionals. From the machine learning perspective, an equally exciting avenue of research consists of extracting this clinical expertise, evaluating outcomes associated with particular decisions, and developing personalised treatment- or diagnostic-recommendation systems to support physicians’ decision-making.

Research Challenges  A particular challenge is brought about by adapting these frameworks to the offline setting, i.e., learning entirely from observational data. No interaction with the decision-making environment is allowed for ethical and practical reasons, which limits the applicability of most methods developed in the literature.

  • Representation Learning: Our decision-making environment is intrinsically partially-observable, both in the dependence of patient evolution on their history and in unobserved factors affecting both doctors’ choices and future patient outcomes. A first research direction, therefore, consists of learning informative representations of patient state and treatment history. 
  • Reward Learning: We also develop methods to learn the reward signal optimised by doctors, which falls in the Inverse Reinforcement Learning (IRL) literature, and to learn reward signals associated with the definition of clinically healthy patient outcomes. 
  • Model-Based, Causal Learning: Inspired by the success of model-based methods in offline RL, our group also investigates how to develop robust dynamics models for planning and policy learning.

Involved group members: Alizee Pace, Gunnar Rätsch


[1] Pace, Alizée, Alex Chan, and Mihaela van der Schaar. "POETREE: Interpretable Policy Learning with Adaptive Decision Trees." International Conference on Learning Representations. 2021.