---
_id: '7181'
abstract:
- lang: eng
text: Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary
predictions1,2, but the complexity of aligning large datasets requires the use
of approximate solutions3, including the progressive algorithm4. Progressive MSA
methods start by aligning the most similar sequences and subsequently incorporate
the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy
declines substantially as the number of sequences is scaled up5. We introduce
a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard
workstation and substantially improves accuracy on datasets larger than 10,000
sequences. Our regressive algorithm works the other way around to the progressive
algorithm and begins by aligning the most dissimilar sequences. It uses an efficient
divide-and-conquer strategy to run third-party alignment methods in linear time,
regardless of their original complexity. Our approach will enable analyses of
extremely large genomic datasets such as the recently announced Earth BioGenome
Project, which comprises 1.5 million eukaryotic genomes6.
article_processing_charge: No
article_type: original
author:
- first_name: Edgar
full_name: Garriga, Edgar
last_name: Garriga
- first_name: Paolo
full_name: Di Tommaso, Paolo
last_name: Di Tommaso
- first_name: Cedrik
full_name: Magis, Cedrik
last_name: Magis
- first_name: Ionas
full_name: Erb, Ionas
last_name: Erb
- first_name: Leila
full_name: Mansouri, Leila
last_name: Mansouri
- first_name: Athanasios
full_name: Baltzis, Athanasios
last_name: Baltzis
- first_name: Hafid
full_name: Laayouni, Hafid
last_name: Laayouni
- first_name: Fyodor
full_name: Kondrashov, Fyodor
id: 44FDEF62-F248-11E8-B48F-1D18A9856A87
last_name: Kondrashov
orcid: 0000-0001-8243-4694
- first_name: Evan
full_name: Floden, Evan
last_name: Floden
- first_name: Cedric
full_name: Notredame, Cedric
last_name: Notredame
citation:
ama: Garriga E, Di Tommaso P, Magis C, et al. Large multiple sequence alignments
with a root-to-leaf regressive method. Nature Biotechnology. 2019;37(12):1466-1470.
doi:10.1038/s41587-019-0333-6
apa: Garriga, E., Di Tommaso, P., Magis, C., Erb, I., Mansouri, L., Baltzis, A.,
… Notredame, C. (2019). Large multiple sequence alignments with a root-to-leaf
regressive method. Nature Biotechnology. Springer Nature. https://doi.org/10.1038/s41587-019-0333-6
chicago: Garriga, Edgar, Paolo Di Tommaso, Cedrik Magis, Ionas Erb, Leila Mansouri,
Athanasios Baltzis, Hafid Laayouni, Fyodor Kondrashov, Evan Floden, and Cedric
Notredame. “Large Multiple Sequence Alignments with a Root-to-Leaf Regressive
Method.” Nature Biotechnology. Springer Nature, 2019. https://doi.org/10.1038/s41587-019-0333-6.
ieee: E. Garriga et al., “Large multiple sequence alignments with a root-to-leaf
regressive method,” Nature Biotechnology, vol. 37, no. 12. Springer Nature,
pp. 1466–1470, 2019.
ista: Garriga E, Di Tommaso P, Magis C, Erb I, Mansouri L, Baltzis A, Laayouni H,
Kondrashov F, Floden E, Notredame C. 2019. Large multiple sequence alignments
with a root-to-leaf regressive method. Nature Biotechnology. 37(12), 1466–1470.
mla: Garriga, Edgar, et al. “Large Multiple Sequence Alignments with a Root-to-Leaf
Regressive Method.” Nature Biotechnology, vol. 37, no. 12, Springer Nature,
2019, pp. 1466–70, doi:10.1038/s41587-019-0333-6.
short: E. Garriga, P. Di Tommaso, C. Magis, I. Erb, L. Mansouri, A. Baltzis, H.
Laayouni, F. Kondrashov, E. Floden, C. Notredame, Nature Biotechnology 37 (2019)
1466–1470.
date_created: 2019-12-15T23:00:43Z
date_published: 2019-12-01T00:00:00Z
date_updated: 2023-09-06T14:32:52Z
day: '01'
department:
- _id: FyKo
doi: 10.1038/s41587-019-0333-6
ec_funded: 1
external_id:
isi:
- '000500748900021'
pmid:
- '31792410'
intvolume: ' 37'
isi: 1
issue: '12'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894943/
month: '12'
oa: 1
oa_version: Submitted Version
page: 1466-1470
pmid: 1
project:
- _id: 26580278-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '771209'
name: Characterizing the fitness landscape on population and global scales
publication: Nature Biotechnology
publication_identifier:
eissn:
- '15461696'
issn:
- '10870156'
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
related_material:
record:
- id: '13059'
relation: research_data
status: public
scopus_import: '1'
status: public
title: Large multiple sequence alignments with a root-to-leaf regressive method
type: journal_article
user_id: c635000d-4b10-11ee-a964-aac5a93f6ac1
volume: 37
year: '2019'
...