Sequence Comparison

Assessing the exact or approximate distance between two given biological sequences (DNA, RNA, amino acid, etc.) or parts thereof is a ubiquitous task in sequence bioinformatics. Depending on the application domain, exact local or global alignments are required, or an approximate estimation of their distance suffices. We are interested in both ends of this spectrum. On the one end, we perform research on exact sequence-to-sequence and sequence-to-graph alignment [1-3]; on the other, we are also interested in developing efficient alignment-free methods [4-6] for the estimation of edit distance between two given sequences. These methods are fundamental for many other bioinformatics analysis workflows, such as homology finding, sequence clustering, phylogenetic reconstruction, taxonomic classification, functional assessment, and more. Thus, performing sequence comparisons with minimal resources will benefit a wide range of downstream applications.

Involved group members: Amir Joudaki, Harun Mustafa, Mikhail Karasikov, Ragnar Groot KoerkampAndre Kahles, Gunnar Rätsch

References

[1] Ivanov Pesho, Benjamin Bichsel, Harun Mustafa, André Kahles, Gunnar Rätsch, and Martin Vechev. "Astarix: Fast and optimal sequence-to-graph alignment." In International Conference on Research in Computational Molecular Biology, pp. 104-119. Springer, Cham, 2020.
[2] https://research.curiouscoding.nl/posts/pairwise-alignment
[3] Groot Koerkamp Ragnar and Pesho Ivanov. "Exact global alignment using A* with seed heuristic and match pruning." bioRxiv (2022).
[4] Joudaki Amir, Gunnar Rätsch, and André Kahles. "Fast Alignment-Free Similarity Estimation By Tensor Sketching." bioRxiv (2021): 2020-11.
[5] Karasikov Mikhail, Harun Mustafa, Daniel Danciu, Marc Zimmermann, Christopher Barber, Gunnar Rätsch, and André Kahles. "Metagraph: Indexing and analysing nucleotide archives at petabase-scale." BioRxiv (2020).
[6] Karasikov Mikhail, Harun Mustafa, Gunnar Rätsch, and André Kahles. "Lossless indexing with counting de bruijn graphs." In International Conference on Research in Computational Molecular Biology, pp. 374-376. Springer, Cham, 2022.