Comprehensive Patient Representations

Patients are at the centre of medicine. Understanding and managing the health of the patient is the primary task of healthcare providers, calling for the creation of computational tools to aid in these processes.

We are interested in methods to learn representations of patients based on observational medical data. This data typically amounts to electronic health records (EHR), which consist of time-stamped instances of patient care (treatments, observations, procedures), with clinical data in a variety of formats such as text notes, pathology images, physiological measurements, genomic data and more. This is information is routinely collected during care, and thus can feature errors, irregular time sampling, missingness, and other properties of data collected under non-laboratory conditions.

We aim to study and develop approaches capable of efficiently integrating all this data into a single representation, and subsequently using such representation to reach a better understanding of the patient, make predictions about each patient’s future state, provide personalized therapies, etc.

Empirical analysis of unsupervised learning of patient embeddings

We are currently working on methods to learn patient representations from Intensive Care Unit (ICU) data in an unsupervised manner. For this purpose we are benchmarking several state of the art approaches such as feedforward neural network auto-encoders, recurrent neural network auto-encoders, etc. We are analysing how suitable these encodings are to reconstruct the original patient information, and also how informative they are to predict future events that will occur to the patient.

Medical language representation

Much important information is contained in clinical text notes written by doctors, nurses and other clinicians. These notes can contain recommended treatments, assessment of patient wellbeing, prognosis and notable developments - information which may not appear elsewhere in the patient’s EHR. However, integrating this text information requires a representation of medical language. We have worked on this problem at two levels - firstly, developing a method to learn representations of words, integrating prior knowledge, specifically for medical use. This is to address the fact that medical English is different to generic English in terms of assumed word meaning (e.g., ‘patient’), while medical text corpora are limited in size relative to generic corpora. Secondly, we have worked with others to represent entire clinical text notes (text summarisation) in order to perform mortality prediction in the ICU[3].

Topic Modeling for Mining Clinical Notes

Relevant features are often embedded in free-text clinical notes written for human consumption by domain experts. This text data includes details such as patient history, symptoms and care plans that cannot be found elsewhere in a patient’s EHR. By employing generative topic models we can create a digitized representation of text notes for use in further data analysis. As a proof-of-concept we applied this strategy to a set of 5000 patients’ clinical notes [4]. By analyzing correlations between patients’ clinical text topics and their genetic testing results, we independently re-identified several notable correlations between patient symptoms and their cancer mutations.

Learning The Dynamics Of Topic Evolution In Clinical Text Timeseries

We are interested in detecting dynamic structure in time-series of clinical text to infer variables describing the health trajectories of patients. In a first study, we have learned a Markov model of the topic representations for the sequences of patient reports to learn transition probabilities and typical health trajectories and combined this temporal model with patient survival information to discover correlations between clinical note topics and mortality [5].