Skip to content

Combinatorial Approaches to Managing Ancestry Effects in Chronic Disease Genetics

by Jason Sardell, Ph.D. Statistical/Population Geneticist

Share this:

One of the biggest challenges for precision medicine today is the lack of replicability of results from Genome Wide Association Studies (GWAS), especially between different populations. Associations between genotypes and polygenic diseases found in one study often fail to hold for individuals from populations that were not well-represented in the original study cohort.

This has major impacts for the reach of precision medicine and for health equity. Pharmacogenomics advances derived from current GWAS may provide little benefit to patients outside of the initial study’s main population group. As GWAS mostly rely on datasets generated primarily from white European and American patient populations, this can limit their relevance to other patient groups, exacerbating health disparities.

Results from GWAS can fail to apply across different populations for several reasons. Firstly, especially in chronic disease, there may be more than one etiology for the disease as these are often diagnosed through an observed phenotype, e.g. breathlessness and wheezing in asthma, whose symptoms may have multiple causes. Related to this, the genetic variants that are drivers of disease in one population may be rare or simply absent in other populations.

Secondly, results may fail to replicate because they represent “false positive” associations resulting from unrecognized population structure in the original dataset. When disease prevalence is higher in one sub-population, genotypes that are more common in that sub-population may be indirectly associated with increased disease risk even if they have no biological effect.[1]

Thirdly, the association between a genotype and disease can vary among populations if the phenotype is influenced by higher order gene-gene interactions (i.e., epistasis). For example, variants of CC chemokine receptor 5 (CCR5) have different effects on HIV transmission and disease progression in African-Americans and white Caucasians, due to interactions between CCR5 haplotypes.[2] A variant may be beneficial in patients from populations where compatible haplotypes are more common, but maladaptive in patients from populations where incompatible haplotypes are at high frequency. Gene-gene interactions may be particularly important in highly polygenic chronic diseases.[3]

Finally, gene-environment interactions can cause a variant to have population specific effects. An example of this is comes from the fatty acid desaturases (FADS) gene family, where a variant associated with decreased heart disease in Greenlandic populations is associated with increased disease risk in European populations.[4] [5] Here the effect of the variant depends on an interaction with the prevailing diet. The FADS variant facilitates metabolism of the high amounts of polyunsaturated fatty acids in the largely seafood-based diet of traditional Greenlandic society. These compensatory effects result in health complications in individuals with agricultural-based diets.

While expanding different populations’ representation in study design is a very important priority, this will not however solve all of the issues by itself. Even after GWAS’ scope are expanded to encompass populations with different heritages, it will remain challenging to use GWAS results to design precision medicine treatments for small sub-groups and individuals with complex, admixed ancestries. This will leave large pools of unmet medical need unresolved.

In contrast to GWAS, combinatorial analytics is uniquely suited for detecting interactions responsible for differences in disease risk across different populations. Traditional GWAS assume that a genotype’s effect on disease risk is the same in all individuals. The distribution of a given SNP genotype is measured in cases and in controls, and if the distributions are sufficiently skewed between those groups, the SNP is deemed to be disease associated.

Genetic variants may however be pleiotropic, and one trait can be a confounder for another. Some such traits can be associated not with the cause of disease, but with different genetic ancestries, and so are usually viewed as ‘confounders’ in GWAS studies that should be corrected for using covariate and conditional analysis. This however means that factors associated with a disease only in a minority population sub-group can be eliminated by covariate analysis, even if they may actually be potentially causative in that cohort.

In contrast, combinatorial analytics allows the effect of a SNP on health to vary based on the genomic background of the patient by testing for associations between its presence and disease status in the context of a set of SNP genotypes.3 This has two main advantages. Firstly the non-linear effects of gene-gene interactions can be included, and secondly, because the case:control associations for the specific combinations are each evaluated separately, disease associated combinations can therefore be found that are relevant only to a subset of the population.

Combinatorial analytics can identify combinations of SNP genotypes that evolved in different sub-populations but have adverse health impacts when they co-occur in individuals with admixed ancestry. Importantly, combinatorial approaches are not restricted to genetic features[6], and can be used to identify interactions between genetic risk variants and clinical, epidemiological and environmental variables, such as diet, that also often vary between populations.

Using PrecisionLife’s combinatorial approach, we identified differences in the underlying networks associated with type-II diabetes and its complications.[7] Notably we showed that disease risk in white British populations is influenced primarily by combinations of phenotypic and social features, while disease risk in patients of South Asian descent is more strongly correlated with combinations of genetic factors.

We were similarly able to identify the factors associated with specific types of complications in type-II diabetes, building a model that can predict the disease trajectory for an individual patient at the point of diagnosis. Similar combinatorial analysis studies offer great promise for untangling the genetic causes of disease in greater detail across different populations.

[1] Hellwege J. et al. (2017) “Population stratification in genetic association studies.” Curr Protoc Hum Genet. 95:1.22.1-1.22.23

[2] Gonzalez E. et al. (1999) “Race-specific HIV-1 disease-modifying effects associated with CCR5 haplotypes.” Proc Natl Acad Sci U S A., 96(21): 12004–12009.

[3] Gardner S. (2021) “Combinatorial analytics: An essential tool for the delivery of precision medicine and precision agriculture.” Artificial Intelligence in the Life Sciences, 1: https://doi.org/10.1016/j.ailsci.2021.100003.

[4] Fumagalli M. et al. (2015) “Greenlandic Inuit show genetic signatures of diet and climate adaptation.” Science 349:1343-1347.

[5] Ye K. et al. (2017) “Dietary adaptation of FADS genes in Europe varied across time and geography.” Nature Ecology & Evolution 1:0167.

[6] Das S, et al. (2021) “Combinatorial analysis of phenotypic and clinical risk factors associated with hospitalized COVID-19 patients.” medRxiv, doi: https://doi.org/10.1101/2021.02.08.21250899.

[7] PrecisionLife (2021) “Identification of genetic drivers of complications in type 2 diabetes.” https://demo.dpc.uk.com/wp-content/uploads/2021/01/Type-2-diabetes-Disease-Study-290121.pdf

Contact us

Ask us a question or contact us to discuss potential collaborations and partnership opportunities by sending us a message here and we'll get back to you as soon as we can.

Form header

Sign Up

Subscribe to our blog