261-5112-00L Algorithms and Data Structures for Population Scale Genomics (Autumn 2020)

Semester Autumn Semester 2020
Lecturers A. Kahles
Periodicity yearly course
Language of instruction English

Abstract
Research in Biology and Medicine have been transformed into disciplines of applied data science over the past years. Not only size and inherent complexity of the data but also requirements on data privacy and complexity of search and access pose a wealth of new research questions.

Objective
This interactive course will explore the latest research on algorithms and data structures for population scale genomics applications and give insights into both the technical basis as well as the domain questions motivating it.

Content
Over the duration of the semester, the course will cover three main topics. Each of the topics will consist of 70-80% lecture content and 20-30% seminar content.

1) Algorithms and data structures for text and graph compression. Motivated through applications in compressive genomics, the course will cover succinct indexing schemes for strings, trees and general graphs, compression schemes for binary matrices as well as the efficient representation of haplotypes and genomic variants.

2) Stochastic data structures and algorithms for approximate representation of strings and graphs as well as sets in general. This includes winnowing schemes and minimizers, sketching techniques, (minimal perfect) hashing and approximate membership query data structures.

3) Data structures supporting encryption and data privacy. As an extension to data structures discussed in the earlier topics, this will include secure indexing using homomorphic encryption as well as design for secure storage and distribution of data.

Prerequisites / Notice
Data Structures & Algorithms

Location

The lecture is scheduled to be held at ETH in HG D 3.2 (link to location). Wednesdays 14-16. This room will allow for the appropriate distancing measures in compliance with the Corona pandemic measures. The lecture will be recorded and made available online in the ETH videoportal.

Course Overview

Date Topic Course Material
16.09.2020 Lecture: Introduction to the topic and Formalities Lecture Slides 01
23.09.2020 Lecture: String indexing (SA, BWT, FM-index) and sorting Lecture Slides 02
Examples
30.09.2020 Lecture: Space-efficient index construction Lecture Slides 03
Examples
07.10.2020 Lecture: Rank/select structures, BWT construction, Bi-directional BWT Lecture Slides 04
Examples
14.10.2020 Lecture: Variations of BWT (Indexing of trees and graphs) Lecture Slides 05
Examples
21.10.2020 Lecture: Variations of BWT (PBWT, gPBWT, generalized compressed suffix arrays) prior week continued
28.10.2020 Lecture: Alignment-based genome comparison Lecture Slides 06
Examples
04.11.2020 Lecture: Alignment-free genome comparison (kernels, winnowing, hashing) Lecture Slides 07
Examples
Video 1
Video 2
11.11.2020 Lecture: Genome/Text compression Lecture Slides 08
Video
18.11.2020 Lecture: Approximate genome representation using filters Lecture Slides 09
Examples
Video
25.11.2020 Lecture: Sketching techniques, Cardinality estimation Lecture Slides 10
Examples
Video
02.12.2020 Lecture: Encryption and privacy preserving computation Lecture Slides 11
Examples
Video
09.12.2020 Lecture: Recap / Exam prep Lecture Slides 12
Video
16.12.2020 Lecture: Research talk Video