New development in alignment-free genome and metagenome comparison

Mathematical Biology

21 September 14:00 - 14:45

Fengzhu Sun - University of Southern California

Next generation sequencing (NGS) technologies have generated enormous amount of shotgun read data and assembly of the reads is challenging, especially for organisms without reference sequences and metagenomes. We develop novel alignment-free and assembly-free statistics for genome and metagenome comparison. The key idea is to remove the background word counts from the observed counts when comparing genomes and metagenomes. Markov chains (MC) are usually used to model background molecular sequences and we develop a new statistical method to estimate the order of MCs based on short read data. The alignment-free sequence comparison statistics are used to study the relationships among species, to assign virus to their hosts, to classify metagenomes and metatranscriptomes, as well as to find the source of white oak trees. In all applications, our novel methods yield results that are consistent with biological knowledge. Thus, our statistics provide powerful alternative approaches for genome and metagenome comparison based on NGS short reads.
Mats Gyllenberg
University of Helsinki
Torbjörn Lundh
Chalmers/University of Gothenburg
Philip Maini
University of Oxford
Roeland Merks
Universiteit Leiden
Mathisca de Gunst
Vrije Universiteit Amsterdam


Roeland Merks


For practical matters at the Institute, send an e-mail to