Amir Joudaki,

“The mind is its own place, and in itself can make a heaven of hell, a hell of heaven..” ― John Milton, Paradise Lost

PhD Student

E-Mail
amir.joudaki@get-your-addresses-elsewhere.inf.ethz.ch
Phone
+41 44 632 65 24
Address
ETH Zürich
Department of Computer Science
Biomedical Informatics Group Universitätsstrasse 6
CAB F52.1
8092 Zürich
Room
CAB F39

I am currently a direct PhD student in biomedical informatics (BMI) group led by Prof. Gunnar Ratsch, working in theoretical machine learning and algorithm design for bio-medicine.

My lifelong passion is to work on the biggest challenges facing us with leveraging recent scientific and technical advances in computer science. I am lucky to be a member of the BMI group, where I can work on a wide range of problems that arise in machine learning for biomedical applications. In particular, I am highly interested in the theoretical understanding of deep neural networks, which could lead to more robust models for the pharmaceutical and medical industries. I also work on designing more scalable genomics and medical methods, using techniques from high-dimensional statistics and randomized algorithms, which will be essential for their adoption beyond academic fields. 

Before joining BMI, I did a BSc in computer engineering at Sharif University, Iran, an MPhil in cognitive neuroscience in SISSA, Italy, and an MSc in computer scienc eat ETH Zurich as part of the direct Ph.D program. During these studies I took theoretical and hands-on topics that are invaluable to my research.

Abstract High-throughput DNA sequencing data are accumulating in public repositories, and efficient approaches for storing and indexing such data are in high demand. In recent research, several graph data structures have been proposed to represent large sets of sequencing data and to allow for efficient querying of sequences. In particular, the concept of labeled de Bruijn graphs has been explored by several groups. Although there has been good progress toward representing the sequence graph in small space, methods for storing a set of labels on top of such graphs are still not sufficiently explored. It is also currently not clear how characteristics of the input data, such as the sparsity and correlations of labels, can help to inform the choice of method to compress the graph labeling. In this study, we present a new compression approach, Multi-binary relation wavelet tree (BRWT), which is adaptive to different kinds of input data. We show an up to 29% improvement in compression performance over the basic BRWT method, and up to a 68% improvement over the current state-of-the-art for de Bruijn graph label compression. To put our results into perspective, we present a systematic analysis of five different state-of-the-art annotation compression schemes, evaluate key metrics on both artificial and real-world data, and discuss how different data characteristics influence the compression performance. We show that the improvements of our new method can be robustly reproduced for different representative real-world data sets.

Authors Mikhail Karasikov , Harun Mustafa , Amir Joudaki , Sara Javadzadeh-no , Gunnar Rätsch , and André Kahles

Submitted Journal of Computational Biology

Link DOI

Abstract High-throughput DNA sequencing data is accumulating in public repositories, and efficient approaches for storing and indexing such data are in high demand. In recent research, several graph data structures have been proposed to represent large sets of sequencing data and to allow for efficient querying of sequences. In particular, the concept of labeled de Bruijn graphs has been explored by several groups. While there has been good progress towards representing the sequence graph in small space, methods for storing a set of labels on top of such graphs are still not sufficiently explored. It is also currently not clear how characteristics of the input data, such as the sparsity and correlations of labels, can help to inform the choice of method to compress the graph labeling. In this work, we present a new compression approach, Multi-BRWT, which is adaptive to different kinds of input data. We show an up to 29% improvement in compression performance over the basic BRWT method, and up to a 68% improvement over the current state-of-the-art for de Bruijn graph label compression. To put our results into perspective, we present a systematic analysis of five different state-of-the-art annotation compression schemes, evaluate key metrics on both artificial and real-world data and discuss how different data characteristics influence the compression performance. We show that the improvements of our new method can be robustly reproduced for different representative real-world datasets.

Authors Mikhail Karasikov, Harun Mustafa, Amir Joudaki, Sara Javadzadeh-No, Gunnar Rätsch, Andre Kahles

Submitted RECOMB 2019

Link DOI