Using genealogies to study the genomic basis of species divergence

Pal A. 2025. Using genealogies to study the genomic basis of species divergence. Institute of Science and Technology Austria.

Download
OA 2025_Pal_Arka_Thesis.pdf 42.72 MB [Published Version]

Thesis | PhD | Published | English
Author

Corresponding author has ISTA affiliation

Series Title
ISTA Thesis
Abstract
Understanding the mechanisms underlying speciation is a central aim of evolutionary biology. A persistent challenge in the field is to identify loci that contribute to reproductive isolation, while disentangling signals of selection from demography, linkage and intrinsic genomic features. Traditional population genomic approaches that rely on site-based statistics in arbitrary fixed windows face inherent limitations, as they conflate historical and contemporary processes of divergence and overlook haplotype structure. Recent advances in whole-genome sequencing and methods to infer ancestral recombination graphs (ARGs) now offer the opportunity to study genealogical relationships explicitly, revealing how lineages coalesce and recombine through time. By directly analysing haplotype clustering by species or phenotype and their patterns of coalescence, ARG-based methods show promise for diagnosing sweeps, identifying barrier loci maintained under divergent selection amid gene flow, and tracing their evolutionary history. In this thesis, I explore the utility of genealogical approaches for studying species divergence. In chapter 2, I propose a conceptual framework for defining haplotype blocks through the structure of the ARG, using simulations and empirical data to highlight how genealogical processes generate rich and often overlooked haplotypic patterns. In chapter 3, I examine the genomic basis of a key evolutionary innovation in marine snails Littorina. These snails offer a unique opportunity to study an innovation because they include a very recent transition from egg-laying to live bearing, yet snails with the different reproductive modes are not reciprocally monophyletic. I exploited this by using topology clustering in ARG-derived local genealogical trees to pinpoint narrow genomic regions or haplotype blocks that carry swept alleles, thus revealing that the transition from egg-laying to live-bearing involves multiple, live-bearer-specific sweeps. Chapter 4 establishes a population-scale, phased genomic resource for Antirrhinum majus, using cost-effective haplotagging, then optimizes imputation from low-coverage data against high-accuracy KASP sequencing to maximize sequence completeness with modest accuracy trade-offs against a traditional short-read sequence pipeline. A hybrid phasing strategy combines molecular phasing with statistical phasing to generate phased whole genome sequences of 1084 Antirrhinum individuals at a fraction of long-read sequencing costs. In chapter 5, I analyse hybridising populations from two replicate hybrid zones to find a parallel genetic basis of flower colour, amidst the noise in genomic differentiation landscape driven by variation in demographic history. While outlier genome scans of FST failed to dissect the causes of differentiation, ARG-based topology clustering revealed a reuse of colour associated haplotypes across hybrid zones. In addition to the biological insight, this chapter also presents a comparison of the latest ARG inference tools, showing that signals of Abstract viii topological clustering qualitatively agree between methods, despite differences in the tree sequences. Next, in chapter 6, by leveraging ~1000 individuals in one of the hybrid zones, I integrated genome-wide association studies of floral pigmentation with genealogical inference, to test for additional colour loci, and confirm the effect of previously described loci. This work demonstrates that flower colour variation is driven by a small number of large effect loci, while also hinting at the presence of a new candidate regulatory factor. Finally in chapter 7, in a preliminary analysis, I begin to dissect the genomic island of speciation around Rosea/Eluta to understand its evolutionary origins. My results show that it consists of 5 highly divergent loci, each of which is associated with flower colour. Using patterns of coalescence in genealogical trees, I find evidence of staggered selective sweeps and a persistent localized barrier to gene flow within an otherwise permeable genome. Together, these chapters add to the increasing pool of studies using genealogical approaches to complement and extend site-based statistics to use haplotype structures in speciation research. By tracking haplotypes directly and connecting genealogical clustering to population processes, ARG-based inference promises to provide new insights into how local selective pressures, demographic history, and long-term barriers interact to shape the genomic architecture of divergence. By underscoring the value of ARGs in revealing the finescale origins and maintenance of biodiversity, this thesis presents cautious optimism about the benefits of using genealogical inference to learn more than what site-based statistics could tell us.
Publishing Year
Date Published
2025-11-25
Publisher
Institute of Science and Technology Austria
Acknowledged SSUs
Page
268
ISSN
IST-REx-ID

Cite this

Pal A. Using genealogies to study the genomic basis of species divergence. 2025. doi:10.15479/AT-ISTA-20694
Pal, A. (2025). Using genealogies to study the genomic basis of species divergence. Institute of Science and Technology Austria. https://doi.org/10.15479/AT-ISTA-20694
Pal, Arka. “Using Genealogies to Study the Genomic Basis of Species Divergence.” Institute of Science and Technology Austria, 2025. https://doi.org/10.15479/AT-ISTA-20694.
A. Pal, “Using genealogies to study the genomic basis of species divergence,” Institute of Science and Technology Austria, 2025.
Pal A. 2025. Using genealogies to study the genomic basis of species divergence. Institute of Science and Technology Austria.
Pal, Arka. Using Genealogies to Study the Genomic Basis of Species Divergence. Institute of Science and Technology Austria, 2025, doi:10.15479/AT-ISTA-20694.
All files available under the following license(s):
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0):
Main File(s)
File Name
Access Level
OA Open Access
Date Uploaded
2025-12-01
Embargo End Date
2026-03-01
MD5 Checksum
7a10a738d58524aebb5dcbd9b34c21c5

Source File
File Name
Access Level
Restricted Closed Access
Date Uploaded
2025-12-01
MD5 Checksum
166d832b08d0434ce407f8f3cb930fe5

Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar