Genomic Prediction of Osteoporosis Using 426,000 Individuals from UK Biobank

Julyan Keller-Baruch1, Marie Forest2, Vincenzo Forgetta2, Audrey Durand3, John Kemp4,5, Dave Evans4,5, Joelle Pineau3, William Leslie6,7, Celia MT Greenwood1,2,8,9, J Brent Richards1,2,10

1. Department of Human Genetics, McGill University, Montréal, Québec, Canada; 2. Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital, McGill University, Montréal, Québec, Canada; 3. School of Computer Science, McGill University, Montréal, Québec, Canada; 4. University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Queensland, Australia; 5. MRC Integrative Epidemiology Unit, University of Bristol, UK; 6. Department of Medicine (Endocrinology), University of Manitoba, Winnipeg, Manitoba, Canada; 7. Department of Radiology (Nuclear Medicine), University of Manitoba, Winnipeg, Manitoba, Canada; 8. Gerald Bronfman Department of Oncology, McGill University, Montréal, Québec, Canada; 9. Department of Epidemiology, Biostatistics & Occupational Health, McGill University, Montréal, Québec, Canada; 10. Department of Twin Research and Genetic Epidemiology, King's College London, London, United Kingdom

Background

The prediction of clinically relevant traits using genetics has not yet been generally successful, even those which are highly heritable, such as bone density, the measure most often used to diagnose osteoporosis. We tested whether machine learning methods could improve prediction of estimated bone mineral density (eBMD) from genotypes.

Methods

We used the UK Biobank to identify 426,812 individuals with eBMD measures at calcaneal ultrasound and genome-wide genotypes, who were of British descent. Dividing the cohort into separate training (N = 341,450), validation (N = 42,681) and test (N = 42,681) datasets, we first evaluated a prediction model containing only age and sex, two commonly used metrics to identify people for BMD testing. Next, we undertook a genome-wide association study for eBMD in the training set. Using top-ranked SNPs from the GWAS by P-value we trained a machine learning algorithm, least absolute shrinkage and selection operator (LASSO), using six different models to predict eBMD in the training set. We assessed the performance of the different models in the validation set and evaluated the prediction performance of the best model in the test set. Last, we tested whether genomic pre-screening could identify individuals unlikely to have a diagnosis of osteoporosis by eBMD measurement.

Results

The variance explained in eBMD by age and sex was 5.8%. The area under the receiver operator curve using age and sex was 71.3% [95% CI: 68.7%-74%] for the diagnosis of osteoporosis. Using the machine learning algorithm and adding genotypic information, the variance explained increased approximately five-fold to 29.7%. The area under the receiver operator curve for the diagnosis of osteoporosis increased to 81.3% (P = 2x10-17, compared to age and sex alone). The maximal combined sensitivity and specificity for the diagnosis of osteoporosis were 75% and 75%, respectively. Including only individuals aged 55-, or 65-and over generated similar prediction performance.

Interpretation

The use of genotypes in a machine learning algorithm resulted in a clinically-relevant improvement in the identification of individuals at risk for eBMD-defined osteoporosis and provides the opportunity to use a genomic pre-screening to focus clinical screening programs.