Bayesian linear regression for analyzing general omics data with time-to-event phenotypes

Villanueva Marijuan A. 2024. Bayesian linear regression for analyzing general omics data with time-to-event phenotypes. Institute of Science and Technology Austria.

Download
OA Masters_thesis_AriadnaVillanueva.pdf 13.05 MB [Published Version]

Thesis | MS | Published | English

Corresponding author has ISTA affiliation

Series Title
ISTA Master's Thesis
Abstract
Recent advancements in molecular diagnostic techniques have enabled the collection of multiple types of omics data from patients, including genomics, epigenomics, proteomics, and transcriptomics. However, we lack effective methods for integrating all these different data types and combining them with clinical outcomes to study the molecular mechanisms that govern pathological phenotypes. We present multi-omics BayesW, a penalized Bayesian regression method that can handle general omics data for survival analysis of time-to-event phenotypes. Our method can: (1) accommodate incomplete data by allowing censored individuals, (2) use continuous time-to-event data to test associations of markers with a phenotype and (3) estimate effects jointly while allowing for independent groups of biological markers. Extensive simulations using planted signals on real data demonstrate that our model accurately retrieves the true parameters of the model while controlling for false discoveries and maintaining the expected prediction accuracy. We address data correlations by estimating the effects jointly, even between omic groups, while also estimating the individual variance explained by each group. We apply our model to two datasets. Using 18,000 individuals from the Generation Scotland study we model the association of time at onset of Type 2 Diabetes, Stroke, Ischemic Disease, and Osteoarthritis from baseline study entry, with 831,724 CpG methylation probes. We find that large proportions of variation in disease onset times can be attributed to methylation as measured in whole blood at baseline in individuals without disease symptoms. We then apply our model to The Cancer Genome Atlas (TCGA) pan-cancer dataset, in which we use 5 types of omics: copy number variation, epigenetics, somatic mutations, miRNA, and gene expression. For cancer survival age-at-onset we find that, when fitting the 5 groups together, almost all variation attributable to "omics" data is explained by DNA methylation. When considering progression times, both methylation and gene expression explain a large part of the variance. We found 2 genes that are significantly associated (95% posterior inclusion probability) with cancer survival time, conditional on all other genome-wide omics data variation. Owing to the vast variability of mechanisms characterizing different cancers, there are likely few specific genes with a strong signal in a pan-cancer setting. Taken together, we showed the applicability of our multi-omics BayesW model to a wide-range of biological questions in multi-omics data.
Publishing Year
Date Published
2024-08-13
Publisher
Institute of Science and Technology Austria
Page
60
ISSN
IST-REx-ID

Cite this

Villanueva Marijuan A. Bayesian linear regression for analyzing general omics data with time-to-event phenotypes. 2024. doi:10.15479/at:ista:17368
Villanueva Marijuan, A. (2024). Bayesian linear regression for analyzing general omics data with time-to-event phenotypes. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17368
Villanueva Marijuan, Ariadna. “Bayesian Linear Regression for Analyzing General Omics Data with Time-to-Event Phenotypes.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17368.
A. Villanueva Marijuan, “Bayesian linear regression for analyzing general omics data with time-to-event phenotypes,” Institute of Science and Technology Austria, 2024.
Villanueva Marijuan A. 2024. Bayesian linear regression for analyzing general omics data with time-to-event phenotypes. Institute of Science and Technology Austria.
Villanueva Marijuan, Ariadna. Bayesian Linear Regression for Analyzing General Omics Data with Time-to-Event Phenotypes. Institute of Science and Technology Austria, 2024, doi:10.15479/at:ista:17368.
All files available under the following license(s):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0):
Main File(s)
Access Level
OA Open Access
Date Uploaded
2024-08-14
Embargo End Date
2025-02-14
MD5 Checksum
0c2daa174609f0c00919dccc5701d375

Source File
Access Level
Restricted Closed Access
Date Uploaded
2024-08-14
MD5 Checksum
e9ed4465dfa539ac4c3a8d4d0b6271a1

Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar