Tumor Profiler Multi-Omics Data Integration

Integration of Multi-Modal Single-Cell Data

Single-cell profiling technologies allow for the profiling of biological samples at the resolution of individual cells. Over the last decade, these technologies have developed explosively [1]. We can now profile cells across several biological modalities, measuring properties such as gene or protein expression, spatial information via imaging technologies, DNA mutations, etc. These data modalities each contribute a unique perspective and would achieve a more holistic view of the cellular state by combining the information from each perspective. Unfortunately, cells are consumed as they are measured; thus, individual cells can only be measured within a single modality. The lab applies analytical techniques and the development of methods to recover this holistic view of the cell.

The lab is a member of the Tumor Profiler consortium [2], a multi-lab initiative that applies single-cell technologies to the benefit of clinical decision-making in cancer treatment. Here, a tumour from a cancer patient is extracted and divided among several labs, which profile their sample with a unique single cell profiling technology. The lab has developed a method, Single cell Integration via Matching (SCIM) [3], to integrate multi-modal single cell data by identifying pairs of corresponding cells profiled by different technologies. SCIM constructs a latent space invariant to profiling technology using an adversarial training procedure. Integration is then achieved by identifying pairs of corresponding cells across technologies using a bipartite matching scheme. 

Predicting Perturbation Responses of Single Cells using Optimal Transport

Predicting molecular responses to perturbations is a core concern of biological sciences. This has applications in understanding cells' developmental trajectories or modeling their responses to treatments. A critical complication of this task is that cells are destroyed when measured. Thus, we only have access to pairs of distributions of cells before and after an applied perturbation. Recovering the responses of individual cells amounts to learning a coupling between these distributions.

The lab co-develops CellOT [4], a method that utilises the theory of optimal transport (OT), a mathematical framework to manipulate probability distributions according to principles of minimum action, to learn such a coupling. CellOT uses neural networks to parameterise a pair of convex potentials that arise from the dual formulation of the OT problem [makkuva]. We demonstrate that couplings learned by CellOT capture even fine details of their target distributions and significantly improve upon the current state-of-the-art methods. Furthermore, since the transport plan learned by CellOT is fully parameterised, we demonstrate that we can use it to predict the responses of cells from held-out samples.

Digital Pathology Image analysis & Spatial Transcriptomics 

Digital Pathology is a well-established technology for cancer diagnosis and disease progression monitoring. Thin tissue sections are stained with hematoxylin and eosin (H&E), highlighting cells and tissue structures, and are then digitised to allow further analysis by pathologists. Deep learning models are now employed to fully exploit available data to assist medical practitioners. Models are used for cell typing, disease staging, and even treatment recommendations. As the digitalised H&E images are rich in information beyond simple morphological features, molecular classification using deep learning and identification of morpho-molecular correlates is made possible, as exemplified by [5]. Furthermore, leveraging information encoded in pathology images allows for virtual protein staining of the slides using Generative Adversarial Networks [6]. This information can be directly used for immuno-phenotyping and in-depth characterisation of the tumour microenvironment, circumventing the need for staining multiple consecutive slides using immunohistochemistry.

On the other hand, Spatial Transcriptomics, a newly-emerged technology that enables RNAseq profiling while preserving spatial context, provides complementary information to H&E images. New methods for hierarchical representation learning are being developed to integrate resulting information across different data views, including morphological, molecular, and spatial layers and allow for learning from local and global patterns and utilising information from multiple resolutions. Ultimately, the learned representations can be used for refined disease staging and characterisation of immune-phenotypes.

Involved group members: Joanna Ficek, Stefan Stark, Kjong Lehmann, Sonali Andani, Gunnar Rätsch

[1] Lähnemann, David, Johannes Köster, Ewa Szczurek, Davis J. McCarthy, Stephanie C. Hicks, Mark D. Robinson, Catalina A. Vallejos et al. "Eleven grand challenges in single-cell data science." Genome biology 21, no. 1 (2020): 1-35.
[2] Irmisch, Anja, Ximena Bonilla, Stéphane Chevrier, Kjong-Van Lehmann, Franziska Singer, Nora C. Toussaint, Cinzia Esposito et al. "The Tumor Profiler Study: integrated, multi-omic, functional tumour profiling for clinical decision support." Cancer Cell 39, no. 3 (2021): 288-293.

[3] Stark, Stefan G., Joanna Ficek, Francesco Locatello, Ximena Bonilla, Stéphane Chevrier, Franziska Singer, Gunnar Rätsch, and Kjong-Van Lehmann. "SCIM: universal single-cell matching with unpaired feature sets." Bioinformatics 36, no. Supplement_2 (2020): i919-i927.
[4] Charlotte Bunne, Stefan G. Stark, Gabriele Gut, Jacobo Sarabia del Castillo, Mitchell Levesque, Kjong Van Lehmann, Lucas Pelkmans, Andreas Krause, and Rätsch, Gunnar. "Learning Single-Cell Perturbation Responses using Neural Optimal Transport." (2022).
[5] Sarah Fremond, Sonali Andani, Juriaan Barkey Wolf et al., Interpretable Deep Learning Predicts the Molecular Endometrial Cancer Classification from H&E Images: A Combined Analysis of the Portec Randomized Clinical Trials, manuscript under review in The Lancet Digital Health, 2022.
[6] Joanna Ficek, Sonali Andani, Simon Heinke et al., Multi-V-Stain: prediction of multiplexed protein abundance for virtual staining of H&E images, manuscript in preparation.