TY - JOUR
T1 - Deep integrative models for large-scale human genomics
AU - Sigurdsson, Arnór I
AU - Louloudis, Ioannis
AU - Banasik, Karina
AU - Westergaard, David
AU - Winther, Ole
AU - Lund, Ole
AU - Ostrowski, Sisse Rye
AU - Erikstrup, Christian
AU - Pedersen, Ole Birger Vesterager
AU - Nyegaard, Mette
AU - Brunak, Søren
AU - Vilhjálmsson, Bjarni J
AU - Rasmussen, Simon
AU - DBDS Genomic Consortium
A2 - Chalmer, Mona Ameri
A2 - Didriksen, Maria
A2 - Dowsett, Joseph
A2 - Hansen, Thomas Folkmann
A2 - Kogelman, Lisette
N1 - © The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2023/7/7
Y1 - 2023/7/7
N2 - Polygenic risk scores (PRSs) are expected to play a critical role in precision medicine. Currently, PRS predictors are generally based on linear models using summary statistics, and more recently individual-level data. However, these predictors mainly capture additive relationships and are limited in data modalities they can use. We developed a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), specifically designed for large-scale genomics data. The framework supports multi-task learning, automatic integration of other clinical and biochemical data, and model explainability. When applied to individual-level data from the UK Biobank, the GLN model demonstrated a competitive performance compared to established neural network architectures, particularly for certain traits, showcasing its potential in modeling complex genetic relationships. Furthermore, the GLN model outperformed linear PRS methods for Type 1 Diabetes, likely due to modeling non-additive genetic effects and epistasis. This was supported by our identification of widespread non-additive genetic effects and epistasis in the context of T1D. Finally, we constructed PRS models that integrated genotype, blood, urine, and anthropometric data and found that this improved performance for 93% of the 290 diseases and disorders considered. EIR is available at https://github.com/arnor-sigurdsson/EIR.
AB - Polygenic risk scores (PRSs) are expected to play a critical role in precision medicine. Currently, PRS predictors are generally based on linear models using summary statistics, and more recently individual-level data. However, these predictors mainly capture additive relationships and are limited in data modalities they can use. We developed a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), specifically designed for large-scale genomics data. The framework supports multi-task learning, automatic integration of other clinical and biochemical data, and model explainability. When applied to individual-level data from the UK Biobank, the GLN model demonstrated a competitive performance compared to established neural network architectures, particularly for certain traits, showcasing its potential in modeling complex genetic relationships. Furthermore, the GLN model outperformed linear PRS methods for Type 1 Diabetes, likely due to modeling non-additive genetic effects and epistasis. This was supported by our identification of widespread non-additive genetic effects and epistasis in the context of T1D. Finally, we constructed PRS models that integrated genotype, blood, urine, and anthropometric data and found that this improved performance for 93% of the 290 diseases and disorders considered. EIR is available at https://github.com/arnor-sigurdsson/EIR.
KW - Genetic Predisposition to Disease
KW - Genome, Human
KW - Genome-Wide Association Study
KW - Genomics/methods
KW - Genotype
KW - Humans
KW - Models, Genetic
KW - Multifactorial Inheritance
KW - Polymorphism, Single Nucleotide
KW - Risk Factors
UR - http://www.scopus.com/inward/record.url?scp=85165406299&partnerID=8YFLogxK
U2 - 10.1093/nar/gkad373
DO - 10.1093/nar/gkad373
M3 - Journal article
C2 - 37224538
SN - 0305-1048
VL - 51
SP - e67
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 12
ER -