Alizée Pace, MSc MPhil

PhD Student

E-Mail
alizee.pace@get-your-addresses-elsewhere.ai.ethz.ch
Address
ETH AI Center
ETH Zürich
Department of Computer Science
Universitätsstrasse 6
8092 Zürich
Room
CAB E 77.1
twitter
@AlizeePace

My research interests are centred on applications of machine learning and causal inference in medicine. I develop imitation and reinforcement learning methods for treatment prediction and clinical decision support.

I am a Doctoral Fellow at the ETH AI Center, jointly supervised by Prof. Gunnar Rätsch of the BMI group and Prof. Bernhard Schölkopf, leader the Empirical Inference group at the Max-​Planck Institute for Intelligent Systems (Tübingen, Germany). Prior to my PhD, I led a project on imitation learning for clinical decision-making with Prof. Mihaela van der Schaar at the University of Cambridge. My professional experience also includes medical device development for stroke treatment, sensor-assisted surgery and 3D-printed heart stents, as well as software engineering at CERN. In parallel, I studied Physics, Materials Science and Machine Learning at Cambridge.

Please see my personal website for further information. 

Abstract Applying reinforcement learning (RL) to real-world problems is often made challenging by the inability to interact with the environment and the difficulty of designing reward functions. Offline RL addresses the first challenge by considering access to an offline dataset of environment interactions labeled by the reward function. In contrast, Preference-based RL does not assume access to the reward function and learns it from preferences, but typically requires an online interaction with the environment. We bridge the gap between these frameworks by exploring efficient methods for acquiring preference feedback in a fully offline setup. We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm, which leverages a learned environment model to elicit preference feedback on simulated rollouts. Drawing on insights from both the offline RL and the preference-based RL literature, our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy. We provide theoretical guarantees regarding the sample complexity of our approach, dependent on how well the offline data covers the optimal policy. Finally, we demonstrate the empirical performance of Sim-OPRL in different environments.

Authors Alizée Pace, Bernhard Schölkopf, Gunnar Rätsch, Giorgia Ramponi

Submitted ICML 2024 MFHAIA

Link

Abstract The success of reinforcement learning from human feedback (RLHF) in language model alignment is strongly dependent on the quality of the underlying reward model. In this paper, we present a novel approach to improve reward model quality by generating synthetic preference data, thereby augmenting the training dataset with on-policy, high-quality preference pairs. Motivated by the promising results of Best-of-N sampling strategies in language model training, we extend their application to reward model training. This results in a self-training strategy to generate preference pairs by selecting the best and worst candidates in a pool of responses to a given query. Empirically, we find that this approach improves the performance of any reward model, with an effect comparable to the addition of a similar quantity of human preference data. This work opens up new avenues of research for improving RLHF for language model alignment, by offering synthetic preference generation as a solution to reward modeling challenges.

Authors Alizée Pace, Jonathan Mallinson, Eric Malmi, Sebastian Krause, Aliaksei Severyn

Submitted ICLR 2024 DPFM

Link

Abstract A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a major obstacle to effective offline RL. In the present paper, we tackle the problem of hidden confounding in the nonidentifiable setting. We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty, which uses variation over world models compatible with the observations, and differentiate it from the well-known epistemic and aleatoric uncertainties. We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them. Our method does not assume identifiability of the unobserved confounders, and attempts to reduce the amount of confounding bias. We demonstrate through extensive experiments and ablations the efficacy of our approach on a sepsis management benchmark, as well as on electronic health records. Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice.

Authors Alizée Pace, Hugo Yèche, Bernhard Schölkopf, Gunnar Ratsch, Guy Tennenholtz

Submitted ICLR 2024

Link

Abstract Recent advances in deep learning architectures for sequence modeling have not fully transferred to tasks handling time-series from electronic health records. In particular, in problems related to the Intensive Care Unit (ICU), the state-of-the-art remains to tackle sequence classification in a tabular manner with tree-based methods. Recent findings in deep learning for tabular data are now surpassing these classical methods by better handling the severe heterogeneity of data input features. Given the similar level of feature heterogeneity exhibited by ICU time-series and motivated by these findings, we explore these novel methods' impact on clinical sequence modeling tasks. By jointly using such advances in deep learning for tabular data, our primary objective is to underscore the importance of step-wise embeddings in time-series modeling, which remain unexplored in machine learning methods for clinical data. On a variety of clinically relevant tasks from two large-scale ICU datasets, MIMIC-III and HiRID, our work provides an exhaustive analysis of state-of-the-art methods for tabular time-series as time-step embedding models, showing overall performance improvement. In particular, we evidence the importance of feature grouping in clinical time-series, with significant performance gains when considering features within predefined semantic groups in the step-wise embedding module.

Authors Rita Kuznetsova, Alizée Pace, Manuel Burger, Hugo Yèche, Gunnar Rätsch

Submitted ML4H 2023 (PMLR)

Link DOI

Abstract Models that can predict the occurrence of events ahead of time with low false-alarm rates are critical to the acceptance of decision support systems in the medical community. This challenging task is typically treated as a simple binary classification, ignoring temporal dependencies between samples, whereas we propose to exploit this structure. We first introduce a common theoretical framework unifying dynamic survival analysis and early event prediction. Following an analysis of objectives from both fields, we propose Temporal Label Smoothing (TLS), a simpler, yet best-performing method that preserves prediction monotonicity over time. By focusing the objective on areas with a stronger predictive signal, TLS improves performance over all baselines on two large-scale benchmark tasks. Gains are particularly notable along clinically relevant measures, such as event recall at low false-alarm rates. TLS reduces the number of missed events by up to a factor of two over previously used approaches in early event prediction.

Authors Hugo Yèche, Alizée Pace, Gunnar Rätsch, Rita Kuznetsova

Submitted ICML 2023

Link DOI

Abstract Balancing forces within weight-bearing joints such as the hip during joint replacement is essential for implant longevity. Minimising implant failure and the corresponding need for expensive and difficult revision surgery is vital to both improve the quality of life of the patient and lighten the burden on overstretched healthcare systems. However, balancing forces during total hip replacements is currently subjective and entirely dependent on surgical skill, as there are no sensors currently on the market that are capable of providing quantitative force feedback within the small and complex geometry of the hip joint. Here, we solve this unmet clinical need by presenting a thin and conformable microfluidic force sensor, which is compatible with the standard surgical procedure. The sensors are fabricated via additive manufacturing, using a combination of 3D and aerosol-jet printing. We optimised the design using finite element modelling, then incorporated and calibrated our sensors in a 3D printed model hip implant. Using a bespoke testing rig, we demonstrated high sensitivity at typical forces experienced following implantation of hip replacements. We anticipate that these sensors will aid soft tissue balancing and implant positioning, thereby increasing the longevity of hip replacements. These sensors thus represent a powerful new surgical tool for a range of orthopaedic procedures where balancing forces is crucial.

Authors Liam Ives, Alizée Pace, Fabian Bor, Qingshen Jing, Tom Wade, Jehangir Cama, Vikas Khanduja, Sohini Kar-Narayan

Submitted Materials & Design

Link DOI

Abstract Building models of human decision-making from observed behaviour is critical to better understand, diagnose and support real-world policies such as clinical care. As established policy learning approaches remain focused on imitation performance, they fall short of explaining the demonstrated decision-making process. Policy Extraction through decision Trees (POETREE) is a novel framework for interpretable policy learning, compatible with fully-offline and partially-observable clinical decision environments -- and builds probabilistic tree policies determining physician actions based on patients' observations and medical history. Fully-differentiable tree architectures are grown incrementally during optimization to adapt their complexity to the modelling task, and learn a representation of patient history through recurrence, resulting in decision tree policies that adapt over time with patient information. This policy learning method outperforms the state-of-the-art on real and synthetic medical datasets, both in terms of understanding, quantifying and evaluating observed behaviour as well as in accurately replicating it -- with potential to improve future decision support systems.

Authors Alizée Pace, Alex Chan, Mihaela van der Schaar

Submitted ICLR 2022 (Spotlight)

Link