Exploiting SNP correlations within Random Forest for Genome-Wide Association Studies


Motivation The primary goal of genome-wide association studies (GWAS) is to discover variants that could lead, in isolation or in combination, to a particular trait or disease. Standard approaches to GWAS however are usually based on univariate hypothesis tests and therefore cannot account neither for correlations due to linkage disequilibrium nor for combinations of several markers. To discover and leverage such potential multivariate interactions, we propose in this work an extension of the Random Forest algorithm tailored for structured GWAS data.

Results In terms of risk prediction, we show empirically on several GWAS datasets that the proposed T-Trees method significantly outperforms both the original Random Forest algorithm and baseline linear models, thereby suggesting the actual existence of multivariate non-linear effects due to the combinations of several SNPs. We also demonstrate that variable importances as derived from our method can help identify relevant loci. Finally, we highlight the strong impact that quality control procedures may have, both in terms of predictive power and loci identification.

Original article PDF


Variable importance results

Disease Method QC Version
BD Bipolar disorder
Random Forests WTCCC View
T-Trees WTCCC View
Random Forests QC View
T-Trees QC View
CAD Coronary artery disease
Random Forests WTCCC View
T-Trees WTCCC View
Random Forests QC View
T-Trees QC View
CD Crohn's disease
Random Forests WTCCC View
T-Trees WTCCC View
Random Forests QC View
T-Trees QC View
HT Hypertension
Random Forests WTCCC View
T-Trees WTCCC View
Random Forests QC View
T-Trees QC View
RA Rheumatoid arthritis
Random Forests WTCCC View
T-Trees WTCCC View
Random Forests QC View
T-Trees QC View
T1D Type 1 diabetes
Random Forests WTCCC View
T-Trees WTCCC View
Random Forests QC View
T-Trees QC View
T2D Type 2 diabetes
Random Forests WTCCC View
T-Trees WTCCC View
Random Forests QC View
T-Trees QC View

University of Li├Ęge

Vincent Botta

vincent.botta[at]ulg.ac.be