Multi-Omics Data Integration (TuPro)

Integration of multi-modal single cell data

Single cell profiling technologies allow for the profiling of biological samples at the resolution of individual cells. Over the last decade these technologies have developed at an explosive rate [1] and we can now profile cells across several biological modalities, measuring properties such as gene or protein expression, spatial information via imaging technologies, DNA mutations, etc. These data modalities each contribute a unique perspective and a more holistic view of cellular state would be achieved by combining the information from each perspective. Unfortunately, cells are consumed as they are measured and thus individual cells can only be measured within a single modality. The lab applies analytical techniques as well as methods development to recover this holistic view of the cell.

The lab is a member of the Tumor Profiler consortium [2], a multi-lab initiative that applies single cell technologies to the benefit of clinical decision making in cancer treatment. Here, a tumor from a cancer patient is extracted and divided amongst several labs which profile their sample with a unique single cell profiling technology. The lab has developed a method, Single cell Integration via Matching (SCIM) [3], to integrate multi-modal single cell data by identifying pairs of corresponding cells profiled by different technologies. SCIM constructs a latent space invariant to profiling technology using an adversarial training procedure. Integration is then achieved by identifying pairs of corresponding cells across technologies using a bipartite matching scheme. 

Predicting perturbation responses of single cells using optimal transport

​​Understanding how individual cells respond to perturbations is an important problem that could improve, for example, targeted therapies of cancer patients. A major challenge here is that cells are typically destroyed when measured, so while we do not have access to observations of an individual cell before and after a perturbation, we do have access to unpaired sets of cells, drawn from the same distribution and measured either before or after a perturbation. Responses of individual cells can be recovered by learning a coupling across these two sets of cells.

The lab develops CellOT [1], a method to model individual cellular responses to perturbations by learning a coupling of treated and untreated distributions using the theory of optimal transport. Optimal transport theory provides us with a powerful mathematical framework that describes transformations between two distributions according to a minimal cost of mass transport. Utilizing recent advancements in neural optimal transport theory [2], CellOT parameterizes a pair of convex potentials that appear in the dual formulation of the transport problem with feed-forward neural networks [3]. 

We demonstrate that CellOT makes significant improvements over the current state-of-the-art approaches. Furthermore, the model is able to make predictions on incoming, untreated cells which can , for instance, be used to predict how cancer patients would respond to different therapies to optimize their treatments.

Digital Pathology Image analysis & Spatial Transcriptomics

Digital Pathology is a well-established technology, used for cancer diagnosis and disease progression monitoring. Thin tissue sections are stained with hematoxylin and eosin (H&E), highlighting cells and tissue structures and are then digitised to allow further analysis by pathologists. As the digitalised H&E images are rich in information, beyond simple morphological features, molecular classification using deep learning as well as identification of morpho-molecular correlates is made possible, as exemplified by [4]. Furthermore, leveraging information encoded in pathology images allows for virtual protein staining of the slides using conditional Generative Adversarial Networks [5]. The model learns abundance of multiple proteins simultaneously, providing a multiplex of images which can be directly used for immuno-phenotyping and in-depth characterization of tumor microenvironment, circumventing the need for staining of multiple consecutive slides using immunohistochemistry.

On the other hand, Spatial Transcriptomics, a newly-emerged technology that enables RNAseq profiling while preserving spatial context, provides complementary information to H&E images. The resulting information can be integrated across different data views, including morphological, molecular and spatial layers, into a joint representation of the patient. To this end, we utilise a set of regularised autoencoders and learn representations in a hierarchical manner, from spot through tissue region to the patient level. Ultimately, the learned representations are clustered to define groups of patients with potentially different clinical characteristics.

Involved group members: Joanna Ficek, Stefan Stark, Kjong Lehmann (alumnus), Sonali Andani, Gunnar Rätsch

[1] Charlotte Bunne, Stefan G. Stark, Gabriele Gut, Jacobo Sarabia del Castillo, Mitchell Levesque, Kjong Van Lehmann, Lucas Pelkmans, Andreas Krause, and Rätsch, Gunnar. "Learning Single-Cell Perturbation Responses using Neural Optimal Transport." (2022).
[2] Makkuva, Ashok, Amirhossein Taghvaei, Sewoong Oh, and Jason Lee. 2020. “Optimal Transport Mapping via Input Convex Neural Networks.” In Proceedings of the 37th International Conference on Machine Learning, edited by Hal Daumé Iii and Aarti Singh, 119:6672–81. Proceedings of Machine Learning Research. PMLR.
[3] Amos, Brandon, Lei Xu, and J. Zico Kolter. 2017. “Input Convex Neural Networks.” In Proceedings of the 34th International Conference on Machine Learning, edited by Doina Precup and Yee Whye Teh, 70:146–55. Proceedings of Machine Learning Research. PMLR.
[4] Sarah Fremond, Sonali Andani, Juriaan Barkey Wolf et al., Interpretable Deep Learning Predicts the Molecular Endometrial Cancer Classification from H&E Images: A Combined Analysis of the Portec Randomized Clinical Trials. Under revision.
[5] Joanna Ficek, Sonali Andani, Simon Heinke et al., Multi-V-Stain: prediction of multiplexed protein abundance for virtual staining of H&E images. Manuscript in preparation.