GWAS Based Prediction of Antidepressant Response in Major Depressive Disorder Using Machine Learning Models

Malgorzata Maciukiewicz, PhD1; Victoria S. Marshe, HBSc1,2; Arun K. Tiwari, PhD1,3, Etienne Sibille, PhD4,5; James L. Kennedy, MD PhD1,2,3; Charles F. Reynolds 3rd, MD PhD6; Eric J. Lenze, MD PhD7; Benoit H. Mulsant, MD PhD3,4; Daniel J. Müller*, MD PhD1,2,3

1. Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Canada; 2. Institute of Medical Science, Faculty of Medicine, University of Toronto, Toronto, Canada; 3. Department of Psychiatry, Faculty of Medicine, University of Toronto, Canada; 4. Department of Pharmacology, University of Toronto, Canada; 5. Centre for Addiction and Mental Health, Campbell Family Mental Health Research Institute Toronto, Canada; 6. Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA; 7. Healthy Mind Lab, Department of Psychiatry, Washington University, St. Louis, MO, USA

*Corresponding author

Background: Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant medications (AD). However, large variability is observed in terms of response to AD. Gene variants and GWAS based machine learning (ML) models may be useful to predict treatment outcomes. In this study, we assessed performance of published models and performed cross-trial prediction of antidepressant response using a late life depression (LLD) sample treated with venlafaxine (IRL GREY, NCT00892047) and a sample of adults  treated with escitalopram (STAR*D; NCT00021528).

Methods: In IRL-Grey we included n=306 participants of European ancestry with clinical and genome-wide data available. For STAR*D sample, we included n=832 Europeans.  To keep consistency with STAR*D data, we used Hamilton 17-item scale (HRSD) to assess response (change >=50% from baseline) and remission based on HRSD score <=7 at the end. As in initial analyses, we trained our models using 5-fold nested cross-validation (CV). We filtered SNPs in two steps: 1) genome-wide logistic regression to identify potentially significant (at p-value levels of 0.05; 0.005 and 0.0005) variants related response/remission and 2) functional based filtering (CADD PHRED score >=10). Steps 1) and 2) were independently followed by LASSO regression for pre-selected SNPs. Subsequently, classification-regression trees (CRT), support vector machines (SVM), gradient boost machines (GBM) and logistic regression (LR) were applied to construct models, using ten-fold cross-validation. For both samples genome-wide data was available and quality control. Given that different genotyped platforms were used, we run genome-wide imputation using impute2 and phase 3 of 1,000 Genomes as reference set. For external testing, we used SNPs common for IRL-Grey and STAD*D and selected it least 1,2 or 3 folds of initial CV.  Association-based filtering was conducted in PLINK 1.9. For LASSO and ML model’s construction, we used “glmnet” and “caret” packages for R.

Results: In case of HRSD-based classification, remission rates were 42.48% and 46.63% for IRL-Grey and STAR*D respectively, whereas response rates were more balanced, 51.31% and 51.32%. For association-based filtering, we obtained most promising models using a threshold of p=0.005 for CRT with accuracy = 0.55, sensitivity = 0.58, and specificity = 0.54. However, on STAR*D, even the best performing model (GBM, SNPs present in 1 or more folds in initial CV) was non-significant (accuracy = 0.53, sensitivity = 0.55, and specificity = 0.51; p=0.175). When we applied functionality-based filtering (CADD scores>=10), average model performance in the internal IRL-Grey testing set was similar to associated-based filtering (accuracy = 0.54, sensitivity = 0.58, and specificity = 0.50). Notably, GBM models achieved promising performance on the external STAR*D test set (accuracy = 0.55, sensitivity = 0.56, and specificity = 0.53; p=0.028) when we included SNPs present in two or more folds in initial CV. Addition of sex and baseline HAMD subscales (depressed mood, feelings of guilt, middle insomnia, work and activities, somatic anxiety, somatic gastrointestinal symptoms) slightly decreased performance (accuracy = 0.54, sensitivity = 0.59, and specificity = 0.50; p=0.038).

Conclusions: Our analysis shows that the initial integration of genetic and clinical data could be interpreted as promising and may achieve models which can predict response. Nonetheless, innovative approaches are needed to increase model performance and utility. Our future studies will explore functional based filtering approaches and the use of hypothesis-driven SNP sets which will ensure biologically relevant predictors which can also inform drug mechanisms.