Bayesian linear regression for analyzing general omics data with time-to-event phenotypes
Villanueva Marijuan A. 2024. Bayesian linear regression for analyzing general omics data with time-to-event phenotypes. Institute of Science and Technology Austria.
Download
Thesis
| MS
| Published
| English
Supervisor
Corresponding author has ISTA affiliation
Department
Series Title
ISTA Master's Thesis
Abstract
Recent advancements in molecular diagnostic techniques have enabled the collection of
multiple types of omics data from patients, including genomics, epigenomics, proteomics,
and transcriptomics. However, we lack effective methods for integrating all these different
data types and combining them with clinical outcomes to study the molecular mechanisms
that govern pathological phenotypes. We present multi-omics BayesW, a penalized Bayesian
regression method that can handle general omics data for survival analysis of time-to-event
phenotypes. Our method can: (1) accommodate incomplete data by allowing censored
individuals, (2) use continuous time-to-event data to test associations of markers with a
phenotype and (3) estimate effects jointly while allowing for independent groups of biological
markers. Extensive simulations using planted signals on real data demonstrate that our model
accurately retrieves the true parameters of the model while controlling for false discoveries
and maintaining the expected prediction accuracy. We address data correlations by estimating
the effects jointly, even between omic groups, while also estimating the individual variance
explained by each group. We apply our model to two datasets. Using 18,000 individuals from
the Generation Scotland study we model the association of time at onset of Type 2 Diabetes,
Stroke, Ischemic Disease, and Osteoarthritis from baseline study entry, with 831,724 CpG
methylation probes. We find that large proportions of variation in disease onset times can
be attributed to methylation as measured in whole blood at baseline in individuals without
disease symptoms. We then apply our model to The Cancer Genome Atlas (TCGA) pan-cancer
dataset, in which we use 5 types of omics: copy number variation, epigenetics, somatic
mutations, miRNA, and gene expression. For cancer survival age-at-onset we find that, when
fitting the 5 groups together, almost all variation attributable to "omics" data is explained by
DNA methylation. When considering progression times, both methylation and gene expression
explain a large part of the variance. We found 2 genes that are significantly associated (95%
posterior inclusion probability) with cancer survival time, conditional on all other genome-wide
omics data variation. Owing to the vast variability of mechanisms characterizing different
cancers, there are likely few specific genes with a strong signal in a pan-cancer setting. Taken
together, we showed the applicability of our multi-omics BayesW model to a wide-range of
biological questions in multi-omics data.
Keywords
Publishing Year
Date Published
2024-08-13
Publisher
Institute of Science and Technology Austria
Page
60
ISSN
IST-REx-ID
Cite this
Villanueva Marijuan A. Bayesian linear regression for analyzing general omics data with time-to-event phenotypes. 2024. doi:10.15479/at:ista:17368
Villanueva Marijuan, A. (2024). Bayesian linear regression for analyzing general omics data with time-to-event phenotypes. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17368
Villanueva Marijuan, Ariadna. “Bayesian Linear Regression for Analyzing General Omics Data with Time-to-Event Phenotypes.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17368.
A. Villanueva Marijuan, “Bayesian linear regression for analyzing general omics data with time-to-event phenotypes,” Institute of Science and Technology Austria, 2024.
Villanueva Marijuan A. 2024. Bayesian linear regression for analyzing general omics data with time-to-event phenotypes. Institute of Science and Technology Austria.
Villanueva Marijuan, Ariadna. Bayesian Linear Regression for Analyzing General Omics Data with Time-to-Event Phenotypes. Institute of Science and Technology Austria, 2024, doi:10.15479/at:ista:17368.
All files available under the following license(s):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0):
Main File(s)
File Name
Masters_thesis_AriadnaVillanueva.pdf
13.05 MB
Access Level

Date Uploaded
2024-08-14
Embargo End Date
2025-02-14
MD5 Checksum
0c2daa174609f0c00919dccc5701d375
Source File
File Name
Masters thesis-AriadnaVillanueva.zip
45.64 MB
Access Level

Date Uploaded
2024-08-14
MD5 Checksum
e9ed4465dfa539ac4c3a8d4d0b6271a1