In this lecture, we will give a survey of indexing data structures for sequence data, and more specifically full-text indexes, both from theoretical and practical perspective. We start with the classical suffix tree data structure that remains a very popular tool in theoretical studies as well as in some practical applicaitons. We also present two related structures: Directed Acyclic Word Graph (DAWG) and position heap. We then present the suffix array, which is a more space-efficient structure in practice, and elaborate on its relation to suffix trees. Finally, we present a yet more compact data structure – so-called FM-index – based on the combinatorial Burrows-Wheeler transform. FM-index is now becoming a ubiquitous tool in many bioinformatics applications that we illustrate with several examples.
|CSEDays - одно из самых многообещающих IT-событий на Урале, хотя масштаб уже в этом году приобрел всероссийский характер.|