Using genealogies to study the genomic basis of species divergence
Pal A. 2025. Using genealogies to study the genomic basis of species divergence. Institute of Science and Technology Austria.
Download
Thesis
| PhD
| Published
| English
Author
Supervisor
Corresponding author has ISTA affiliation
Department
Series Title
ISTA Thesis
Abstract
Understanding the mechanisms underlying speciation is a central aim of evolutionary biology.
A persistent challenge in the field is to identify loci that contribute to reproductive isolation,
while disentangling signals of selection from demography, linkage and intrinsic genomic
features. Traditional population genomic approaches that rely on site-based statistics in
arbitrary fixed windows face inherent limitations, as they conflate historical and
contemporary processes of divergence and overlook haplotype structure. Recent advances in
whole-genome sequencing and methods to infer ancestral recombination graphs (ARGs) now
offer the opportunity to study genealogical relationships explicitly, revealing how lineages
coalesce and recombine through time. By directly analysing haplotype clustering by species
or phenotype and their patterns of coalescence, ARG-based methods show promise for
diagnosing sweeps, identifying barrier loci maintained under divergent selection amid gene
flow, and tracing their evolutionary history.
In this thesis, I explore the utility of genealogical approaches for studying species
divergence. In chapter 2, I propose a conceptual framework for defining haplotype blocks
through the structure of the ARG, using simulations and empirical data to highlight how
genealogical processes generate rich and often overlooked haplotypic patterns.
In chapter 3, I examine the genomic basis of a key evolutionary innovation in marine
snails Littorina. These snails offer a unique opportunity to study an innovation because they
include a very recent transition from egg-laying to live bearing, yet snails with the different
reproductive modes are not reciprocally monophyletic. I exploited this by using topology
clustering in ARG-derived local genealogical trees to pinpoint narrow genomic regions or
haplotype blocks that carry swept alleles, thus revealing that the transition from egg-laying
to live-bearing involves multiple, live-bearer-specific sweeps.
Chapter 4 establishes a population-scale, phased genomic resource for Antirrhinum
majus, using cost-effective haplotagging, then optimizes imputation from low-coverage data
against high-accuracy KASP sequencing to maximize sequence completeness with modest
accuracy trade-offs against a traditional short-read sequence pipeline. A hybrid phasing
strategy combines molecular phasing with statistical phasing to generate phased whole
genome sequences of 1084 Antirrhinum individuals at a fraction of long-read sequencing
costs.
In chapter 5, I analyse hybridising populations from two replicate hybrid zones to find
a parallel genetic basis of flower colour, amidst the noise in genomic differentiation landscape
driven by variation in demographic history. While outlier genome scans of FST failed to dissect
the causes of differentiation, ARG-based topology clustering revealed a reuse of colour
associated haplotypes across hybrid zones. In addition to the biological insight, this chapter
also presents a comparison of the latest ARG inference tools, showing that signals of
Abstract
viii
topological clustering qualitatively agree between methods, despite differences in the tree
sequences.
Next, in chapter 6, by leveraging ~1000 individuals in one of the hybrid zones, I
integrated genome-wide association studies of floral pigmentation with genealogical
inference, to test for additional colour loci, and confirm the effect of previously described loci.
This work demonstrates that flower colour variation is driven by a small number of large effect
loci, while also hinting at the presence of a new candidate regulatory factor.
Finally in chapter 7, in a preliminary analysis, I begin to dissect the genomic island of
speciation around Rosea/Eluta to understand its evolutionary origins. My results show that it
consists of 5 highly divergent loci, each of which is associated with flower colour. Using
patterns of coalescence in genealogical trees, I find evidence of staggered selective sweeps
and a persistent localized barrier to gene flow within an otherwise permeable genome.
Together, these chapters add to the increasing pool of studies using genealogical
approaches to complement and extend site-based statistics to use haplotype structures in
speciation research. By tracking haplotypes directly and connecting genealogical clustering to
population processes, ARG-based inference promises to provide new insights into how local
selective pressures, demographic history, and long-term barriers interact to shape the
genomic architecture of divergence. By underscoring the value of ARGs in revealing the finescale origins and maintenance of biodiversity, this thesis presents cautious optimism about
the benefits of using genealogical inference to learn more than what site-based statistics
could tell us.
Publishing Year
Date Published
2025-11-25
Publisher
Institute of Science and Technology Austria
Acknowledged SSUs
Page
268
ISSN
IST-REx-ID
Cite this
Pal A. Using genealogies to study the genomic basis of species divergence. 2025. doi:10.15479/AT-ISTA-20694
Pal, A. (2025). Using genealogies to study the genomic basis of species divergence. Institute of Science and Technology Austria. https://doi.org/10.15479/AT-ISTA-20694
Pal, Arka. “Using Genealogies to Study the Genomic Basis of Species Divergence.” Institute of Science and Technology Austria, 2025. https://doi.org/10.15479/AT-ISTA-20694.
A. Pal, “Using genealogies to study the genomic basis of species divergence,” Institute of Science and Technology Austria, 2025.
Pal A. 2025. Using genealogies to study the genomic basis of species divergence. Institute of Science and Technology Austria.
Pal, Arka. Using Genealogies to Study the Genomic Basis of Species Divergence. Institute of Science and Technology Austria, 2025, doi:10.15479/AT-ISTA-20694.
All files available under the following license(s):
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0):
Main File(s)
File Name
2025_Pal_Arka_Thesis.pdf
42.72 MB
Access Level
Open Access
Date Uploaded
2025-12-01
Embargo End Date
2026-03-01
MD5 Checksum
7a10a738d58524aebb5dcbd9b34c21c5
Source File
File Name
2025_Pal_Arka_Thesis.docx
60.63 MB
Access Level
Closed Access
Date Uploaded
2025-12-01
MD5 Checksum
166d832b08d0434ce407f8f3cb930fe5
Material in ISTA:
Part of this Dissertation
Part of this Dissertation
Part of this Dissertation
