Using combinatorial analysis we can find additional signal in patient datasets that is invisible to existing GWAS and other genetic analysis methods[1][2][3]. This additional power is illustrated by a meta-analysis of the various large-scale studies that have been performed into the genetic factors underpinning COVID-19 host response as it relates to disease susceptibility and severity.
COVID-19 surprised the medical community by the range of its symptoms. Rather than a pure viral infection and inflammasome/cytokine response, the disease has had widespread effects across a range of tissues[4][5]. As data became available at the beginning of the pandemic, GWAS analysis of very large patient populations were run by various groups.
A GWAS study involving 1,131 severe patients and 15,434 mild controls identified one locus associated with high risk of developing severe COVID-19 around the ABO blood group gene and another locus on chromosome 3[6]. This study was then extended in a global effort that ran GWAS on genomic data from 13,641 severe disease patients and over 2 million controls, identifying four genome-wide significant loci that are associated with SARS-CoV-2 infection and 11 associated with severe manifestations of COVID-19[7][8].
Most of these correspond to previously documented genes with associations to lung or autoimmune and inflammatory diseases. The symptomology of COVID-19 however extends much further than can be explained by these findings into neurological, coagulation and cardiovascular, renal and other consequences beyond inflammation driven disease.
In contrast, a PrecisionLife study using a combinatorial analytics approach had been run several months earlier on the very first COVID-19 datasets available from UK Biobank[9][10], with just a few hundred severe patients, while controlling for all the existing known predisposing co-morbidities[11]. This study, working off much smaller datasets than the GWAS studies, identified 156 severe disease associated loci that mapped to 68 protein coding genes, spread across a range of mechanisms.
Many novel targets were identified that are involved in key severe COVID-19 pathology and mechanisms, including production of pro-inflammatory cytokines, endothelial cell dysfunction, lipid droplets, neurodegeneration and viral susceptibility factors. The novel disease associated mechanisms identified in this genetics based study were replicated and validated by subsequent combinatorial analysis of the de-identified patient health records (non-genetic data comprising longitudinal diagnosis, claims and lab data) in the UnitedHealth Group COVID-19 Data Suite [12]. Several of the novel targets have also been subsequently validated in collaborative studies into drug repurposing using viral plaque assays and other disease models[13][14].
[1] Mellerup E, Andreassen O, Bennike B, et al. Connection between genetic and clinical data in bipolar disorder PLoS One. 2012;7(9):e44623. doi:10.1371/journal.pone.0044623
[2] Mellerup E, Møller GL. Combinations of Genetic Variants Occurring Exclusively in Patients. Comput Struct Biotechnol J. 2017 Mar 10;15:286-289. doi: 10.1016/j.csbj.2017.03.001.
[3] Tam V et al, Benefits and limitations of genome-wide association studies, Nat Rev Genet: 2019
[4] Jain U. Effect of COVID-19 on the Organs. Cureus. 2020;12(8):e9540. Published 2020 Aug 3. doi:10.7759/cureus.9540
[5] Rando HM, Bennett TD, Byrd JB, et al. Challenges in defining Long COVID: Striking differences across literature, Electronic Health Records, and patient-reported information. Preprint. medRxiv. 2021;2021.03.20.21253896. Published 2021 Mar 26. doi:10.1101/2021.03.20.21253896
[6] Shelton, JF, Shastri AJ, Ye, C et al. Trans-ethnic analysis reveals genetic and non-genetic associations with COVID-19 susceptibility and severity medRxiv 2020.09.04.20188318; doi: https://doi.org/10.1101/2020.09.04.20188318
[7] Pairo-Castineira, E., Clohisey, S., Klaric, L. et al. Genetic mechanisms of critical illness in COVID-19. Nature 591, 92–98 (2021). https://doi.org/10.1038/s41586-020-03065-y
[8] Severe Covid-19 GWAS Group, Ellinghaus D, Degenhardt F, et al. Genomewide Association Study of Severe Covid-19 with Respiratory Failure. N Engl J Med. 2020;383(16):1522-1534. doi:10.1056/NEJMoa2020283
[9] Sudlow C, Gallacher J, Allen N, Beral V, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015 Mar 31;12(3):e1001779. doi: 10.1371/journal.pmed.1001779.
[10] Armstrong, J., Rudkin, J. K., Allen, N., Crook, D. W., Wilson, D. J., Wyllie, D. H. and A. M. O’Connell Dynamic linkage of COVID-19 test results between Public Health England’s Second Generation Surveillance System and UK Biobank (2020) Microbial Genomics doi:10.1099/mgen.0.000397
[11] Taylor K, Das S, Pearson M, Kozubek J, Pawlowski M, Jensen CE, Skowron Z, Møller GL, Strivens MA, Gardner SP Analysis of Genetic Host Response Risk Factors in Severe COVID-19 Patients medRxiv 2020.06.17.20134015; doi: https://doi.org/10.1101/2020.06.17.20134015
[12] Das S, Pearson M, Taylor K, Bouchet VA, Møller GL, Hall TO, Strivens MA, Tzeng KTH, Gardner SP Combinatorial analysis of phenotypic and clinical risk factors associated with hospitalized COVID-19 patients (in press) medRxiv 2021.02.08.21250899; doi: https://doi.org/10.1101/2021.02.08.21250899
[13] Hofmann-Apitius M et al. The COVID-19 PHARMACOME: A method for the rational selection of drug repurposing candidates from multimodal knowledge harmonization (in press) Fraunhofer Institute for Algorithms and Scientific Computing DE
[14] Sugiyama MG, Cui H, Redka DS, Karimzadeh M et al. Multiscale interactome analysis coupled with off-target drug predictions reveals drug repurposing candidates for human coronavirus disease bioRxiv 2021.04.13.439274; doi: https://doi.org/10.1101/2021.04.13.439274