Large multiple sequence alignments with a root-to-leaf regressive method
Garriga E, Di Tommaso P, Magis C, Erb I, Mansouri L, Baltzis A, Laayouni H, Kondrashov F, Floden E, Notredame C. 2019. Large multiple sequence alignments with a root-to-leaf regressive method. Nature Biotechnology. 37(12), 1466–1470.
Download (ext.)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894943/
[Submitted Version]
Journal Article
| Published
| English
Scopus indexed
Author
Garriga, Edgar;
Di Tommaso, Paolo;
Magis, Cedrik;
Erb, Ionas;
Mansouri, Leila;
Baltzis, Athanasios;
Laayouni, Hafid;
Kondrashov, FyodorISTA ;
Floden, Evan;
Notredame, Cedric
Department
Abstract
Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around to the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes6.
Publishing Year
Date Published
2019-12-01
Journal Title
Nature Biotechnology
Publisher
Springer Nature
Volume
37
Issue
12
Page
1466-1470
ISSN
eISSN
IST-REx-ID
Cite this
Garriga E, Di Tommaso P, Magis C, et al. Large multiple sequence alignments with a root-to-leaf regressive method. Nature Biotechnology. 2019;37(12):1466-1470. doi:10.1038/s41587-019-0333-6
Garriga, E., Di Tommaso, P., Magis, C., Erb, I., Mansouri, L., Baltzis, A., … Notredame, C. (2019). Large multiple sequence alignments with a root-to-leaf regressive method. Nature Biotechnology. Springer Nature. https://doi.org/10.1038/s41587-019-0333-6
Garriga, Edgar, Paolo Di Tommaso, Cedrik Magis, Ionas Erb, Leila Mansouri, Athanasios Baltzis, Hafid Laayouni, Fyodor Kondrashov, Evan Floden, and Cedric Notredame. “Large Multiple Sequence Alignments with a Root-to-Leaf Regressive Method.” Nature Biotechnology. Springer Nature, 2019. https://doi.org/10.1038/s41587-019-0333-6.
E. Garriga et al., “Large multiple sequence alignments with a root-to-leaf regressive method,” Nature Biotechnology, vol. 37, no. 12. Springer Nature, pp. 1466–1470, 2019.
Garriga E, Di Tommaso P, Magis C, Erb I, Mansouri L, Baltzis A, Laayouni H, Kondrashov F, Floden E, Notredame C. 2019. Large multiple sequence alignments with a root-to-leaf regressive method. Nature Biotechnology. 37(12), 1466–1470.
Garriga, Edgar, et al. “Large Multiple Sequence Alignments with a Root-to-Leaf Regressive Method.” Nature Biotechnology, vol. 37, no. 12, Springer Nature, 2019, pp. 1466–70, doi:10.1038/s41587-019-0333-6.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level
Open Access
Material in ISTA:
Research Data
Export
Marked PublicationsOpen Data ISTA Research Explorer
Web of Science
View record in Web of Science®Sources
PMID: 31792410
PubMed | Europe PMC