261-5112-00L Advanced Approaches for Population Scale Compressive Genomics (Autumn 2018)
Semester | Autumn Semester 2018 |
Lecturers | A. Kahles |
Periodicity | yearly course |
Language of instruction | English |
Abstract
Research in Biology and Medicine have been transformed into disciplines of applied data science over the past years. Not only size and inherent complexity of the data but also requirements on data privacy and complexity of search and access pose a wealth of new research questions.
Objective
This interactive course will explore the latest research on algorithms and data structures for population scale genomics applications and give insights into both the technical basis as well as the domain questions motivating it.
Content
Over the duration of the semester, the course will cover three main topics. Each of the topics will consist of 70-80% lecture content and 20-30% seminar content.
1) Algorithms and data structures for text and graph compression. Motivated through applications in compressive genomics, the course will cover succinct indexing schemes for strings, trees and general graphs, compression schemes for binary matrices as well as the efficient representation of haplotypes and genomic variants.
2) Stochastic data structures and algorithms for approximate representation of strings and graphs as well as sets in general. This includes winnowing schemes and minimizers, sketching techniques, (minimal perfect) hashing and approximate membership query data structures.
3) Data structures supporting encryption and data privacy. As an extension to data structures discussed in the earlier topics, this will include secure indexing using homomorphic encryption as well as design for secure storage and distribution of data.
Prerequisites / Notice
Data Structures & Algorithms
Student seminar
For the student seminar on December 12th, the list of suggested papers can be found here.
Location
The lecture will be held at ETH in CHN D 48 (link to location). Wednesdays 13-15.
Course Overview
Date | Topic | Course Material |
---|---|---|
26.09.2018 | Lecture: Introduction to the topic and Formalities | Lecture Slides 01 |
03.10.2018 | Lecture: String indexing (suffix arrays, BWT, FM-index) | Lecture Slides 02 |
10.10.2018 | Lecture: Space-efficient index construction | Lecture Slides 03 |
17.10.2018 | Lecture: Rank/select structures, BWT construction, Bi-directional BWT | Lecture Slides 04 |
24.10.2018 | Lecture: Variations of BWT (Indexing of trees and graphs, PBWT, gPBWT, generalized compressed suffix arrays) | Lecture Slides 05 |
31.10.2018 | Lecture: Genome compression | Lecture Slides 06 |
07.11.2018 | Lecture: Alignment-based genome comparison (MUMs, MEMs) | Lecture Slides 07 |
14.11.2018 | Lecture: Alignment-free genome comparison (kernels, winnowing, hashing) | Lecture Slides 08 |
21.11.2018 | Lecture: Approximate genome representation using filters | Lecture Slides 09 |
28.11.2018 | Lecture: Sketching techniques, Cardinality estimation | Lecture Slides 10 |
05.12.2018 | Lecture: Homomorphic encryption for bioinformatics | Lecture Slides 11 |
12.12.2018 | Lecture: Student seminars; Secure communication for personalized genomics | No slides |
19.12.2018 | Lecture: Recap / Exam prep | Lecture Slides 13 |