Elsevier

The Lancet

Volume 366, Issue 9491, 24–30 September 2005, Pages 1121-1131
The Lancet

Series
Genetic association studies

https://doi.org/10.1016/S0140-6736(05)67424-7Get rights and content

Summary

We review the rationale behind and discuss methods of design and analysis of genetic association studies. There are similarities between genetic association studies and classic epidemiological studies of environmental risk factors but there are also issues that are specific to studies of genetic risk factors such as the use of particular family-based designs, the need to account for different underlying genetic mechanisms, and the effect of population history. Association differs from linkage (covered elsewhere in this series) in that the alleles of interest will be the same across the whole population. As with other types of genetic epidemiological study, issues of design, statistical analysis, and interpretation are very important.

Section snippets

Direct association

The first of these forms of association is termed direct association, and studies of direct association target polymorphisms which are themselves putative causal variants. This type of study is the easiest to analyse and the most powerful, but the difficulty is the identification of candidate polymorphisms. A mutation in a codon which leads to an aminoacid change is a candidate causal variant. However, it is likely that many causal variants responsible for heritability of common complex

Indirect association

In the second type of association, the polymorphism is a surrogate for the causal locus and this type of association allows us to search for causal genes in indirect association studies. However, indirect associations are even weaker than the direct associations they reflect, and it will usually be necessary to type several surrounding markers to have a high chance of picking up the indirect association. Indirect association studies are more difficult to analyse, and there is still debate as to

Confounded association

The final type of association is that due to confounding by stratification and admixture (substructure) within the population. Confounding, as in the rest of epidemiology, raises the possibility both of generating false findings (positive confounding) or obscuring true causal associations (negative confounding). However, although the problem of unobserved confounding is intractable in classic epidemiology, dictating limits on the size of causal effect that can be safely inferred from

Direct association: patterns of genotype–phenotype relationship

We shall consider a diallelic locus, directly related to either a quantitative trait or to a discrete trait such as presence (prevalence), or occurrence (incidence), of a disease. Multiallelic loci lead to more complicated scenarios and generate tests with many degrees of freedom. Even in the simplest diallelic case, different patterns for the genotype–phenotype relationship must be considered. Since there are three possible genotypes, which have a natural order (1/1, 1/2, and 2/2), the

Linear dose-response modelling

In classic mendelian genetics of fully penetrant discrete traits, the description of an allele as dominant implies that the corresponding phenotype will occur irrespective of the number of copies of the allele carried. A recessive allele requires both copies to be present for the phenotype to be evident. In a diallelic system, if neither allele is dominant, 1/2 heterozygotes will display an intermediate phenotype. Fisher16 used the term dominance in a different way to describe the related

Epistasis

The general issue of dominance relates to the extent to which the joint effect of two alleles at a single autosomal locus might be different from the sum (or product in a multiplicative model) of the effects anticipated for each allele independently. A related issue is the degree to which the combined effect of alleles at two or more loci can reasonably be modelled by the individual locus contributions. The fact that inheritance of some traits could only be explained by joint action of two

Indirect association: patterns of linkage disequilibrium

The mapping of susceptibility genes for common complex disorders and genes for other common traits by the indirect method depends on the existence of association, at the population level, between the causal variants and nearby markers. Such association, because of the proximity of loci on the genome, is termed linkage disequilibrium. (Some use this term to describe any population-wide association between loci, whether due to proximity or to another reason such as population stratification and

Study designs

Familiar epidemiological designs such as population-based case-control or cohort designs19, 52 are often used for genetic association studies and the data are analysed much the same way too, risk factors such as smoking and obesity etc, being replaced by the presence or absence of a particular genetic polymorphism. Risk can be considered in terms of either a predisposing allele or genotype, or in terms of multiple categories of disease risk such as the risks associated with different alleles at

Statistical analysis

The analysis of data depends crucially on the study design. In the simplest case, familiar methods such as logistic regression, χ2 tests of association, and odds ratios may be suitable. At a single marker, the issue arises as to whether to analyse on the basis of allele counts or genotype counts. Suppose we have case and control data for a single diallelic genetic locus (table 4). A simple χ2 test for independence has 2 degrees of freedom. Two odds ratios can be calculated: af/be (for genotype

Significance and importance

The standards of statistical proof that have become acceptable in the general biomedical literature are not appropriate for genetic association studies. Something akin to a multiple testing problem pervades the discipline, although there has been no clear consensus about how it should be dealt with. Approaches such as the Bonferroni correction are not appropriate because it is not the number of tests in any one investigation that is important. Rather it is that the vast majority of loci tested

References (110)

  • C Weinberg

    Studying parents and grandparents to assess genetic contributions to early-onset disease

    Am J Hum Genet

    (2003)
  • LR Cardon et al.

    Population stratification and spurious allelic association

    Lancet

    (2003)
  • C Weinberg et al.

    A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting

    Am J Hum Genet

    (1998)
  • C Weinberg

    Methods for detection of parent-of-origin effects in genetic studies of case-parents triads

    Am Hum Genet

    (1999)
  • NJ Camp

    Genomewide transmission/disequilibrium testing: consideration of the genotypic relative risks at disease loci

    Am J Hum Genet

    (1997)
  • M Knapp

    A note on power approximations for the transmission/disequilibrium test

    Am J Hum Genet

    (1999)
  • R McGinnis

    General equations for Pt, Ps, and the power of the TDT and the affected-sib-pair test

    Am J Hum Genet

    (2000)
  • D Clayton

    A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission

    Am J Hum Genet

    (1999)
  • F Dudbridge et al.

    Unbiased application of the transmission/disequilibrium test to multilocus haplotypes

    Am J Hum Genet

    (2000)
  • H Cordell et al.

    A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes

    Am J Hum Genet

    (2002)
  • E Martin et al.

    A test for linkage and association in general pedigrees: the pedigree disequilibrium test

    Am J Hu Genet

    (2000)
  • S Lake et al.

    Family-based tests of association in the presence of linkage

    Am J Hum Genet

    (2000)
  • K Lunetta et al.

    Family-based tests of association and linkage that use unaffected sibs, covariates, and interactions

    Am J Hum Genet

    (2000)
  • G Abecasis et al.

    A general test of association for quantitative traits in nuclear families

    Am J Hum Genet

    (2000)
  • L Barcellos et al.

    Association mapping of disease loci, by use of a pooled DNA genomic screen

    Am J Hum Genet

    (1997)
  • H Seltman et al.

    Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes

    Am J Hum Genet

    (2001)
  • R Fan et al.

    Genome association studies of complex diseases by case-control designs

    Am J Hum Genet

    (2003)
  • M Stephens et al.

    A new statistical method for haplotype reconstruction from population data

    Am J Hum Genet

    (2001)
  • D Fallin et al.

    Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data

    Am J Hum Genet

    (2000)
  • M Epstein et al.

    Inference on haplotype effects in case-control studies using unphased genotype data

    Am J Hum Genet

    (2003)
  • K Livak et al.

    Towards fully automated genome-wide polymorphism screening

    Nat Genet

    (1995)
  • N Risch et al.

    The future of genetic studies of complex human diseases

    Science

    (1996)
  • The International HapMap project

    Nature

    (2003)
  • Hattersley AT, McCarthy MI. What makes a good genetic association study? Lancet (in...
  • G Taubes

    Epidemiology faces its limits

    Science

    (1996)
  • S Wacholder et al.

    Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer

    Cancer Epidemiol Biomarkers Prev

    (2002)
  • I Dahlman et al.

    Parameters for reliable results in genetic association studies in common disease

    Nat Genet

    (2002)
  • B Devlin et al.

    Genomic control for association studies

    Biometrics

    (1999)
  • SA Bacanu et al.

    Association studies for quantitative traits in structured populations

    Genet Epidemiol

    (2002)
  • R Fisher

    The correlation between relatives on the supposition of Mendelian inheritance

    Trans R Soc Edin

    (1918)
  • M Chiano et al.

    Genotype relative risks under ordered restriction

    Genet Epidemiol

    (1998)
  • S Wright

    The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs

    Proc Natl Acad Sci USA

    (1920)
  • N Breslow et al.

    Statistical Methods in Cancer Research. Volume I—The Analysis of Case-Control Studies. IARC Scientific Publications

    (1980)
  • D Clayton et al.

    Statistical Models in Epidemiology

    (1993)
  • P Sasieni

    From genotypes to genes: doubling the sample size

    Biometrics

    (1997)
  • W Bateson

    Mendel's principles of heredity

    (1909)
  • P Phillips

    The language of gene interaction

    Genetics

    (1998)
  • H Cordell et al.

    Statistical modeling of interlocus interactions in a complex disease: Rejection of the multiplicative model of epistasis in type 1 diabetes

    Genetics

    (2001)
  • H Cordell

    Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans

    Hum Mol Genet

    (2002)
  • J Siemiatycki et al.

    Biological models and statistical interactions: an example from multistage carcinogenesis

    Int J Epidemiol

    (1981)
  • Cited by (414)

    View all citing articles on Scopus
    View full text