{"day":"06","ec_funded":1,"file":[{"file_id":"18151","file_size":11219837,"creator":"lgonzale","date_updated":"2024-09-27T10:32:33Z","date_created":"2024-09-27T10:32:33Z","checksum":"d3303724e8d3c91321d71bbad4062048","file_name":"louisa_thesis_draft__240904b.pdf","content_type":"application/pdf","access_level":"open_access","relation":"main_file"},{"file_id":"18152","creator":"lgonzale","file_size":43338677,"date_updated":"2024-09-27T10:34:34Z","date_created":"2024-09-27T10:34:34Z","file_name":"louisa_thesis_draft__240904b.docx","content_type":"application/vnd.openxmlformats-officedocument.wordprocessingml.document","checksum":"22e63f7f9014dffde2af7a47e7d1d014","access_level":"closed","relation":"source_file"}],"alternative_title":["ISTA Thesis"],"abstract":[{"text":"Understanding the relationship between a given phenotype and its underlying genotype or genotypes is one of the most pressing challenges of biology, as it lies at the heart of not only basic understanding of evolutionary theory, but also of practical applications in medicine and bioengineering. Understanding this relationship is complicated by the ubiquitous phenomenon of epistasis, wherein mutation effects are dependent on their genetic context. Fitness landscapes — representations of phenotype as a function of genotype — are being increasingly used as a tool to study the effects and interactions of thousands of mutations, but are experimentally limited to exploring a small fraction of a protein’s theoretical sequence space. Furthermore, not all regions of said sequence space are necessarily equally informative. Thus, gene selection for landscape surveys should be carefully considered in order to maximize the usable output of necessarily limited data.\r\n\r\nIn this work, we analyzed the fitness landscapes of orthologous green fluorescent proteins from four different species, by systematically measuring the phenotype, fluorescence, of tens of thousands of mutant genotypes from each protein. These landscapes were highly heterogeneous, with some genes being mutationally robust and displaying epistasis only rarely, and others being highly epistatic and mutationally fragile. We used this data to train machine learning models to predict fluorescence from genotype. Although the training data contained almost exclusively genotypes with less than 3% sequence divergence from the original wild-type sequences, we were able to create novel, functional genotypes with up to 20% sequence divergence. Counterintuitively however, genes with high mutational robustness and rare epistasis were more difficult to introduce large numbers of mutations into, not less. This represents the first study of large-scale fitness landscapes of a protein family, and provides insights into how to approach future landscape surveys and their applications in novel protein design.","lang":"eng"}],"department":[{"_id":"GradSch"},{"_id":"FyKo"}],"article_processing_charge":"No","oa":1,"tmp":{"name":"Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)","short":"CC BY-NC-ND (4.0)","image":"/images/cc_by_nc_nd.png","legal_code_url":"https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode"},"_id":"17850","user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","license":"https://creativecommons.org/licenses/by-nc-nd/4.0/","language":[{"iso":"eng"}],"publication_status":"published","degree_awarded":"PhD","page":"89","supervisor":[{"last_name":"Kondrashov","id":"44FDEF62-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0001-8243-4694","full_name":"Kondrashov, Fyodor","first_name":"Fyodor"}],"ddc":["570"],"publication_identifier":{"issn":["2663-337X"]},"date_created":"2024-09-06T12:57:44Z","date_published":"2024-09-06T00:00:00Z","project":[{"call_identifier":"H2020","_id":"2564DBCA-B435-11E9-9278-68D0E5697425","name":"International IST Doctoral Program","grant_number":"665385"},{"call_identifier":"H2020","_id":"26580278-B435-11E9-9278-68D0E5697425","name":"Characterizing the fitness landscape on population and global scales","grant_number":"771209"}],"related_material":{"record":[{"relation":"part_of_dissertation","status":"public","id":"11448"}],"link":[{"url":"https://github.com/aequorea238/Orthologous_GFP_Fitness_Peaks","relation":"software"}]},"doi":"10.15479/at:ista:17850","status":"public","year":"2024","author":[{"first_name":"Louisa","full_name":"Gonzalez Somermeyer, Louisa","orcid":"0000-0001-9139-5383","last_name":"Gonzalez Somermeyer","id":"4720D23C-F248-11E8-B48F-1D18A9856A87"}],"month":"09","has_accepted_license":"1","citation":{"short":"L. Gonzalez Somermeyer, Fitness Landscapes of Orthologous Green Fluorescent Proteins, Institute of Science and Technology Austria, 2024.","mla":"Gonzalez Somermeyer, Louisa. Fitness Landscapes of Orthologous Green Fluorescent Proteins. Institute of Science and Technology Austria, 2024, doi:10.15479/at:ista:17850.","ieee":"L. Gonzalez Somermeyer, “Fitness landscapes of orthologous green fluorescent proteins,” Institute of Science and Technology Austria, 2024.","ista":"Gonzalez Somermeyer L. 2024. Fitness landscapes of orthologous green fluorescent proteins. Institute of Science and Technology Austria.","chicago":"Gonzalez Somermeyer, Louisa. “Fitness Landscapes of Orthologous Green Fluorescent Proteins.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17850.","apa":"Gonzalez Somermeyer, L. (2024). Fitness landscapes of orthologous green fluorescent proteins. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17850","ama":"Gonzalez Somermeyer L. Fitness landscapes of orthologous green fluorescent proteins. 2024. doi:10.15479/at:ista:17850"},"publisher":"Institute of Science and Technology Austria","date_updated":"2024-10-11T11:29:13Z","title":"Fitness landscapes of orthologous green fluorescent proteins","type":"dissertation","acknowledged_ssus":[{"_id":"Bio"},{"_id":"LifeSc"},{"_id":"ScienComp"}],"corr_author":"1","oa_version":"Published Version","file_date_updated":"2024-09-27T10:34:34Z"}