"Let the dataset change your mindset" Hans Rosling
- omineeva@ student.ethz.ch
Department of Computer Science
Biomedical Informatics Group
CAB F 53.1
- CAB F 53.1
I am interested in developing Machine Learning methods for real world problems, in particular, that arise in Healthcare and Genomics.
Before joining Biomedical Informatics Group I studied Plasma Physics in National Research Nuclear University “MEPhI” and Data Science in Skolkovo Institute of Science and Technology in Moscow. My Master’s thesis project was devoted to deep learning for anomaly detection at CMS detector built on LHC at CERN.
In November 2018 I started my PhD in Max Plank - ETH Center for Learning Systems, supervised by Gunnar Rätsch and Isabel Valera.
Abstract Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.
Authors Olga Mineeva, Mateo Rojas-Carulla, Ruth E Ley, Bernhard Schölkopf, Nicholas D Youngblut
Submitted Bioinformatics (Oxford, England)