Ragnar Groot Koerkamp, MSc

Benchmarking is harder than you think, even taking into account this rule.

PhD Student

E-Mail
ragnar.grootkoerkamp@get-your-addresses-elsewhere.inf.ethz.ch
Address
ETH Zürich
Department of Computer Science
Biomedical Informatics Group
Universitätsstrasse 6
8092 Zürich
Room
CAB F39

I work on exact pairwise alignment algorithms. In particular I am working on an algorithm using A*. See my blog research.curiouscoding.nl for thoughts and ideas regarding this and other algorithms.

After completing my masters in mathematics and computer science in Oxford I worked at Google for 2.5 years. I went back to academia to get back to research and have the freedom to solve problems I think are interesting. My focus is on mathematics and algorithms.

Abstract MOTIVATION Sequence alignment has been a core problem in computational biology for the last half-century. It is an open problem whether exact pairwise alignment is possible in linear time for related sequences (Medvedev, 2022b). METHODS We solve exact global pairwise alignment with respect to edit distance by using the A* shortest path algorithm on the edit graph. In order to efficiently align long sequences with high error rate, we extend the seed heuristic for A* (Ivanov et al., 2022) with match chaining, inexact matches, and the novel match pruning optimization. We prove the correctness of our algorithm and provide an efficient implementation in A*PA. RESULTS We evaluate A*PA on synthetic data (random sequences of length n with uniform mutations with error rate e) and on real long ONT reads of human data. On the synthetic data with e=5% and n≤10^7 bp, A*PA exhibits a near-linear empirical runtime scaling of n1.08 and achieves >250× speedup compared to the leading exact aligners EDLIB and BIWFA. Even for a high error rate of e=15%, the empirical scaling is n1.28 for n≤10^7 bp. On two real datasets, A*PA is the fastest aligner for 58% of the alignments when the reads contain only sequencing errors, and for 17% of the alignments when the reads also include biological variation. Availability github.com/RagnarGrootKoerkamp/astar-pairwise-aligner

Authors Ragnar Groot Koerkamp, Pesho Ivanov

Submitted bioRxiv

Link DOI