Call For Papers: 2nd International Symposium on Animal Functional Genomics

Detection of SNP epistasis effects of quantitative traits using an extended Kempthorne model

Abstract

Epistasis effects (gene interactions) have been increasingly recognized as important genetic factors underlying complex traits. The existence of a large number of single nucleotide polymorphisms (SNPs) provides opportunities and challenges to screen DNA variations affecting complex traits using a candidate gene analysis. In this article, four types of epistasis effects of two candidate gene SNPs with Hardy-Weinberg disequilibrium (HWD) and linkage disequilibrium (LD) are considered: additive × additive, additive × dominance, dominance × additive, and dominance × dominance. The Kempthorne genetic model was chosen for its appealing genetic interpretations of the epistasis effects. The method in this study consists of extension of Kempthorne's definitions of 35 individual genetic effects to allow HWD and LD, genetic contrasts of the 35 extended individual genetic effects to define the 4 epistasis effects, and a linear model method for testing epistasis effects. Formulas to predict statistical power (as a function of contrast heritability, sample size, and type I error) and sample size (as a function of contrast heritability, type I error, and type II error) for detecting each epistasis effect were derived, and the theoretical predictions agreed well with simulation studies. The accuracy in estimating each epistasis effect and rates of false positives in the absence of all or three epistasis effects were evaluated using simulations. The method for epistasis testing can be a useful tool to understand the exact mode of epistasis, to assemble genome-wide SNPs into an epistasis network, and to assemble all SNP effects affecting a phenotype using pairwise epistasis tests.

the significance of epistasis in complex traits has been well recognized (3, 12, 16, 17). It has been hypothesized that epistasis is ubiquitous in determining susceptibility to common human diseases. Observations supporting this hypothesis include deviations from Mendelian ratios, molecular interactions in gene regulation and biochemical and metabolic systems, nonrepeatable positive effects of single polymorphisms, and gene interaction effects commonly found when properly investigated (16). Large epistasis networks found in yeast (19, 21) also point to the ubiquitous nature of epistasis. The presence of a large number of single nucleotide polymorphisms (SNPs) provides opportunities and challenges to identify DNA variations associated with complex traits using SNPs as candidate genes. A number of studies on SNP-disease association have been reported (2, 7, 8, 20, 24), but most current candidate gene studies focus on single gene effects. Ongoing studies of geneotyping up to 500,000 SNPs on 2,000 individuals have been reported (1). Such large-scale SNP studies would be capable of detecting certain epistasis effects.1

1The 2nd International Symposium on Animal Functional Genomics was held May 16–19, 2006, at Michigan State University in East Lansing, MI, and was organized by Jeanne Burton of Michigan State University and Guilherme J. M. Rosa of University of Wisconsin, Madison, WI (see meeting report by Drs. Burton and Rosa, Physiol Genomics 28: 1-4, 2006).

For two bi-allelic genes unaffected by the environment, a total of 512 disease models are possible, but the number of nonredundant models is in the range of 50–102 (12). For two bi-allelic genes such as two SNP loci with quantitative measures, epistasis effects are typically modeled by a linear partition of the nine possible genotypic values in quantitative genetics. Fisher (6) partitioned the nine genotypic values into single gene effects (additive and dominance effects) and epistasis effects assuming Hardy-Weinberg equilibrium (HWE) and linkage equilibrium (LE). Also assuming HWE and LE, Cockerham (5) and Kempthorne (10) partitioned Fisher's epistasis effect into four components, additive × additive, additive × dominance, dominance × additive, and dominance × dominance epistasis effect with the genetic interpretation of allele × allele, allele × genotype, genotype × allele, and genotype × genotype interactions, respectively. This partitioning can be used as a tool for identifying the exact mode of a gene interaction effect. Although these four types of epistasis effects could be further partitioned, such a more detailed partitioning may not have practical value, because nine genotypic values allow only nine independent parameters (common mean and 8 genetic effects) to be estimated and eight independent genetic effects to be tested using linear model methods. A statistical model that includes the main effect of each locus and the interaction between the two loci can be used to detect the gene interaction between the two loci, but this type of epistasis detection does not offer a mechanism to test each of the four epistasis effects under the Cockerham and Kempthorne models. Cockerham's model uses orthogonal contrasts of genotypic values, whereas Kempthorne's model uses the deviation of “genetic combination” from the sum of individual genetic factors. For example, an additive × additive effect (combination effect of 2 alleles, each from a different locus) is a deviation of the gametic mean from the two single allelic means. This is clearly a measure for the gene combination effect and hence has a more straightforward genetic interpretation of the epistasis effect. A genetic model (4) without the derivations of Cockerham and Kempthorne could also be used for testing SNP epistasis effects, but this model yields the same epistasis results as the Cockerham and Kempthorne models under the assumptions of HWE, LE, and equal allele frequencies (see Supplemental Materials; online version of article contains Supplemental Materials). With LD, an epistasis model defining each epistasis effect as a weighted average of genotypic values has been described (9). This model could easily be extended to account for HWD. Among the above epistasis models, the Kempthorne model has more straightforward genetic interpretations of epistasis effects and can be easily extended to allow HWD and LD.

In this article, we extend the Kempthorne model for genetic effects of quantitative traits to allow HWD and LD, define contrasts of the extended Kempthorne genetic effects for testing the four epistasis effects of two SNP loci, develop a linear model procedure for testing each epistasis effect, and present analytical and simulation results on statistical power and sample size required for detecting each type of epistasis effect. The accuracy in estimating each epistasis effect, the rate of false-positive results in the absence of the true epistasis effects, and the rate of potential confounding between the four types of epistasis effects are evaluated using simulations. The methods in this article should provide a useful tool for genome-wide pairwise testing of SNP epistasis where HWD and LD may exist, and for the assembly of total SNP effects or an epistasis network of a phenotype.

MATERIALS AND METHODS

The development of SNP epistasis testing methods in this article consists of three steps: extend the Kempthorne model for genetic effects to allow HWD and LD, construct genetic and statistical contrast for testing each epistasis effect, and evaluate the performance of the resulting method for testing epistasis with HWD and LD. Two bi-allelic linked loci, locus 1 with A and a alleles and locus 2 with B and b, are assumed to affect a complex trait of quantitative nature. Each locus is allowed to have HWD. For these two bi-allelic loci, nine genotypes are possible. Let gijkl and pijkl denote the genotypic value and genotypic probability of individuals possessing genotype ij at locus 1 and kl at locus 2, where i and j denote alleles A and a of locus 1, and k and l denote alleles B and b of locus 2 (Table 1). Under the Kempthorne model, each genetic effect is defined as a deviation of genetic combination effect from the sum of individual genetic factors in the genetic combination. Such definitions of gene effects require the calculation of 27 means of genotypic values: 1 population mean and 26 marginal means, which include 4 allelic means, 6 single locus genotypic means, 4 gametic means, 6 allele-genotype means, and 6 genotype-allele means.

Table 1. Genotypic values (gijkl) and frequencies (pijkl) for 2 bi-allelic loci

BBBbbb
AAgAABB, pAABBgAABb, pAABbgAAbb, pAAbb
AagAaBB, pAaBBgAaBb, pAaBbgAabb, pAabb
aagaaBB, paaBBgaaBb, paaBbgaabb, paabb

To extend the Kempthorne definitions of genetic effects to allow HWD and LD, the calculation of the population mean requires the use of the observed genotypic frequencies (pijkl), and the calculations of the 26 marginal means require the use of a series of conditional probabilities based on the observed genotypic frequencies rather than using allele frequencies under HWE and LE, as shown in the Supplemental Material. Let μ = the mean genotypic value in the population, μi = the marginal mean of genotypic values for individuals with allele i (i = A, a), μk = the marginal mean of genotypic values for individuals with allele k (k = B, b), μij = the marginal mean of genotypic values for individuals with genotype ij at the first locus (ij = AA, Aa, aa), μik = the marginal mean of genotypic values for individuals with allele i at locus 1 and allele k at locus 2 (i = A, a; k = B, b), μikl = the marginal mean of genotypic values for individuals with allele i at locus 1 and genotype kl at locus 2 (i = A, a; kk = BB, Bb, bb), and μijk = the marginal mean of genotypic values for individuals with genotype ij at locus 1 and allele k at locus 2 (ij = AA, Aa, aa; k = B, b). With the use of 27 means, typical expressions of 35 possible individual genetic effects of two bi-allelic loci using Kempthorne's definitions can be expressed as follows

(1.1)
(1.2)
(1.3)
(1.4)
(1.5)
(1.6)
(1.7)
(1.8)
where ai = additive effect of allele i at locus 1 (i = A, a), ak = additive effect of allele k at locus 2 (k = B, b), dij = dominance effect of genotype ij at locus 1, dkl = dominance effect of genotype kl at locus 2, (aa)ik = additive × additive epistasis effect of genotypes with alleles ik, (ad)ikl = additive × dominance epistasis effect of genotypes with alleles ikl, (da)ijk = dominance × additive epistasis effect of genotypes with alleles ijk, and (dd)ijkl = dominance × dominance epistasis effect of ijkl genotype. The mathematical expressions of genetic effects of Eqs. 1.11.8 do not change with or without HWD and LD. The only change required to allow HWD and LD is in the calculations of the 27 means. With the definitions of Eqs. 1.11.8, the Kempthorne model of a genotypic value for two loci is
(2)
With the extended Kempthorne genetic effects, genetic contrasts for testing the eight individual genetic effects of two loci allowing HWD and LD can be defined. The resulting contrasts for epistasis testing will be evaluated for its performance, including statistical power, rate of false positives, and accuracy in estimating each epistasis effect. Analytical formulas will be derived to predict the statistical power and sample size requirement for detecting epistasis effects, and the theoretical results will be compared with simulation results. Two types of false-positive results will be considered, namely false positives in the absence of any epistasis effects and false positives in the presence of one type of epistasis effect. The former is to evaluate whether the tests have more false positives than expected for random reasons, and the latter is to evaluate whether any epistasis effect has more false positives due to confounding with other epistasis effects. The rates of these two types of false-positive results will be evaluated using simulations. The accuracy in estimating each epistasis effect will be measured by the mean square error (MSE) from simulation results, where MSE is the sum of the variance and squared bias of the epistasis estimates.

RESULTS

Contrasts of gene effects allowing HWD and LD.

On the basis of the definitions of genetic effects by Eqs. 1.11.8 with the extension to allow HWD and LD, eight genetic contrasts can be defined to test for four single gene effects and four epistasis effects as follows

(3.1)
(3.2)
(3.3)
(3.4)
(3.5)
(3.6)
(3.7)
(3.8)
where αi = additive effect (gene substitution effect) of locus i (i = 1,2), di = dominance effect of locus i (i = 1,2), iaa = additive × additive effect, iad = additive × dominance effect, ida = dominance × additive effect, idd = dominance × dominance effect, kj = contrast vector of β (j = 2,…9), β = column vector of the population mean and the 35 genetic effects represented by Eqs. 1.11.8, g = (gAABB, gAABb, gAAbb, gAaBB, gAaBb, gAabb, gaaBB, gaaBb, gaabb)′, and si is a function of conditional probabilities (Supplemental Material). Under the null hypothesis of no gene combination effect, each of the four epistasis contrasts of Eqs. 3.53.8 is expected to be zero. Eqs. 3.13.8 are orthogonal comparisons of the 35 individual genetic effects represented by Eqs. 1.11.8. Therefore, the test for each genetic effect is independent of the other genetic effects.

Linear model method for epistasis testing.

To test the significance of each genetic contrast, a statistical contrast needs to be used in place of the genetic contrast. Each statistical contrast is obtained by replacing g with the estimate of g (ĝ) using the following procedure. A phenotypic value is modeled as the summation of a genotypic value (gijkl) of the two genes underlying the complex trait and a random residual (e) following N(0,σ) distribution, i.e., phenotypic observation m with gijkl genotypic value is modeled as yijklm = gijkl + eijklm, or y = Xg + e in matrix notations, where y is the column vector of phenotypic values, X is the design matrix, and e is the vector of random residuals. Note that the above linear model can be extended to include nongenetic effects such as age and gender so that the genetic effects can be tested without the influence of those nongenetic effects. This flexibility of accounting for nongenetic effects is an advantage of linear model methods over frequency-based methods in genetic analysis. The normal equations in matrix notations are X′Xg = X′y, and the least squares estimate of g is given by ĝ = (X′X)−1X′y. Then, from Eqs. 3.53.8, the four marker contrasts for testing epistasis effects can be expressed as

(4)
The epistasis estimator represented by Eq. 4 is unbiased, i.e.,
(5)
The null and alternative hypotheses for testing each epistasis effect are Hx0: ix = 0 and Hx1: ix ≠ 0, respectively, where x = aa, ad, da, dd. It is readily seen from Eq. 5 that Hx0 is equivalent to the hypothesis Lx = 0. The test statistic for testing Hx0 is
(6)
where s2=(yXĝ)′(yXĝ)/(nk). Under Hx0, this statistic has a Student's t-distribution with degrees of freedom nk (18), where n is the number of observations and k is the rank of X. To test the overall epistasis effects is to test the general null hypothesis that no epistasis effect exists, i.e., H0: iaa = 0, iad = 0, ida = 0, and idd = 0. The alternative hypothesis is that at least one form of epistasis effect exists. It is easily seen from Eq. 5 that H0 is equivalent to the hypothesis SEg = 0, where SE = (s6, s7, s8, s9)′. The F-test statistic is
(7)
Under the null hypothesis H0: SEg = 0, the F-statistic given by Eq. 7 follows an F-distribution with degrees of freedom 4 and nk (18). This F-test can also be used to test all the genetic effects, including single gene effects or a combination of the genetic effects by modifying Eq. 7.

Statistical power and sample size.

Analytic formulas for predicting statistical power and sample size determination for a given set of statistical and genetic parameters were derived (Eqs. C5–C18 in Supplemental Material) using an approach similar to that used in Refs. 13, 14, and 15. Table 2 shows the sample size requirement for each selected level of epistasis effect assuming a 95% statistical power and a 5% type I error rate. The results show that detecting d × d effect requires a considerably larger sample size than detecting the other epistasis effects. The reason is that the d × d variance is small even when the size of the d × d effect is the same as that of a × a. Under the assumptions of HWE, LE, and equal allele frequencies, σ= 2σ= 2σ= 4σ when |iaa| = |iad| = |ida| = |idd| based on the results of Cockerham (5) and Kempthorne (10), where σ= a × a variance, σ= a × d variance, σ = d × a variance, and σDD2 = d × d variance. Consequently, the contrast heritability of idd, which is the fraction of the phenotypic variance accounted for by the d × d contrast variance, is ∼0.25 of the contrast heritability of iaa. For the same contrast heritability, the sample size required is the same for detecting each of the four epistasis effects. The sample size numbers in Table 2 are merely illustrative, and the actual sample size for a given set of data and threshold values of statistical power and type I error could vary significantly. In general, statistical power is an increasing function of the effect size, sample size, and type I error, while sample size is a decreasing function of type I error, type II error, and effect size.

Table 2. Sample size required to achieve 95% power with a type I error of 5%

Contrast Heritability (a1 = a2 = d1 = d2 = iaa = iad = ida = idd)*a×a, naaa×d, nadd×a, ndad×d, ndd
Haa2Had2Hda2Hdd2
0.00130.00150.00140.000511,54510,89111,69928,299
0.00500.00590.00540.00202,7762,6192,8136,805
0.01000.01190.01080.00401,3021,2281,3193,191
0.01500.01780.01620.00608107648211,986
0.02000.02380.02160.00805655335721,384
0.02500.02970.02700.01004173944231,022
0.03000.03570.03240.0120319301323781
0.03500.04160.03780.0140249235252609
0.04000.04760.04320.0160196185199480
0.04500.05350.04860.0180155146157380
0.05000.05940.05400.0200122115124300

*Indicates that all 8 effects are assumed to be of the same size in defining each heritability. Haa2 values of 0.00123≈0.0500 correspond to 0.10∼0.6304 of iaay, i.e., the size of iaa is about 0.10∼0.6304 phenotypic standard deviations.

Evaluation of statistical power, accuracy of estimating epistasis effects, and rates of false positives, using simulations.

The simulated phenotypic value of each individual is obtained as the summation of the individual quantitative trait locus (QTL) genotypic value and a random residual following N(0,1) distribution. Each simulation generated 10,000 repeats. Statistical powers observed from the simulated data agreed well with the predicted powers (Figs. 1 and 2). When more than one type of epistasis effect exists, the joint testing of the overall epistasis effect had the highest statistical power (predicted QTL and observed QTL in Figs. 1 and 2). Among the four types of epistasis effects involving two loci, the effect with largest contrast heritability (or contrast variance) is the easiest to detect. However, predicting the epistasis effect with the largest variance or contrast variance is not as straightforward as under the assumptions of HWE and LE, where the rank of statistical power is in the order of a × a, a × d (or d × a), and d × d, because the a × a effect has the largest contrast variance or heritability, i.e., H= 2H= 2H= 4Hdd2. Under HWD and LD, this order may not hold because of the existence of covariances between the eight genetic effects (Eq. C3 in Supplemental Materials and Ref. 9). For example, the a × d effect in Fig. 1 actually has slightly higher statistical power than the a × a effect. Figure 1 and all the other figures show that the d × d effect is the most difficult to detect. It is important to note that good statistical power and accuracy for d × d are possible if the d × d effect is sufficiently large. For example, when the d × d contrast heritability is ∼2% of the phenotypic variance (corresponding to 5% a × a contrast heritability; Table 2), the statistical power is nearly as good as for the other three epistasis effects (Fig. 2). In terms of accuracy of estimating epistasis effects measured by MSE, estimates for a × a and a × d (or d × a) had similar accuracy, while estimates for d × d had the worst accuracy (Fig. 3). Again, the poor accuracy in estimating the d × d effect was due to the fact that d × d had the smallest variance under the assumption of equal effect sizes for all epistasis effects. As sample size increases, all estimates tend to have similar accuracies (Fig. 3).

Fig. 1.

Fig. 1.Observed (dotted lines) and predicted (solid lines) statistical power as a function of the population size using the new method for detecting epistasis effects, allowing Hardy-Weinberg and linkage disequilibria [α = 0.0034 and H= 0.025, where α is the threshold significance level for “suggestive linkage” (11) and H is a × a contrast heritability]. QTL, quantitative trait locus.


Fig. 2.

Fig. 2.Observed (dotted lines) and predicted (solid lines) statistical power as a function of H using the new method for detecting epistasis effects, allowing Hardy-Weinberg and linkage disequilibria (α = 0.0034, n = 500).


Fig. 3.

Fig. 3.Mean square error (MSE) of the epistasis estimations as a function of the population size for the same data as in Figs. 1 and 2 (α = 0.0034, H= 0.025).


To evaluate the rate of false positives in the absence of any epistasis effect, the phenotypic value was defined as the random residual following N(0,1) distribution, so that the phenotypic value has no genetic effect. With 10,000 repeats, the observed rate of false positives for any of the four epistasis effects was virtually identical to the cutoff type I error used as the nominal significance level. This shows that the method of testing does not have inflated type I errors over those for random reasons expected by the nominal significance levels. To evaluate the rate of false positives due to confounding among the four types of epistasis effects in the presence of one type of epistasis effect, the phenotypic value was defined as the sum of a genotypic value and a random residual following N(0,1) distribution, where the genotypic value contains only one type of epistasis effect and is obtained by Eq. C2 in Supplemental Materials. With 10,000 repeats, the observed rate of false positives for any of the three epistasis effects not present in the phenotypic value was virtually identical to the cutoff type I error used as the nominal significance level. This demonstrated that the method of testing did not have inflated type I errors due to confounding among the eight genetic effects over those expected for random reasons, as expected by Eqs. 3.13.8.

Comparison with methods requiring HWE and LE.

The method based on the extended Kempthorne model was compared with two methods that have been used in human epistasis testing (17): Cockerham's model (5) under the assumptions of HWE and LE and the genetic model of Cheverud (4). Theoretical analysis showed that the Cheverud model yields identical epistasis results as the Cockerham model under HWE and LE with equal allele frequencies (Supplemental Materials). For this reason, the Cheverud model will be considered as a method requiring HWE and LE in the following discussions. Both theoretical analysis and data simulation showed that the extended Kempthorne model had identical results for testing epistasis effects under HWE and LE with equal allele frequencies as the other two methods requiring HWE and LE (Supplemental Materials). However, application of methods requiring HWE and LE to data with HWD and LD generally had three problems: decreased statistical power, decreased accuracy in parameter estimation (except for a × a effect), and confounding between epistasis effects (or increased rate of false positives). Comparison of Fig. 1 with Fig. 4 shows that the two methods requiring HWE and LE had considerably lower statistical power than our method based on the extended Kempthorne model for detecting each of the a × a, a × d, d × a, and d × d epistasis effects when applied to data with HWD and LD. In terms of accuracy in estimating epistasis effects, the two methods requiring HWE and LE had larger MSEs (lower accuracies) than our method for a × d, d × a, and d × d epistasis effects, particularly when the sample size was small (Figs. 3 and 5). For detecting a × a effect, the methods requiring HWE and LE in fact had smaller MSEs. Confounding or increased rate of false positives is another problem when applying methods requiring HWE and LE to data with HWD and LD. Unlike other statistical problems such as power and accuracy problems that tend to diminish as sample size increases, the problem of confounding or false positives of the methods requiring HWE and LE becomes more serious as sample size increases (Figs. 6 and 7). The most serious confounding was that between d × d and a × a effects (Fig. 6), where d × d is the only genetic effect present in the phenotypic value but a high frequency of significant a × a effect was also observed. Confounding between a × d and d × a effects was also observed (Fig. 7) but was less serious than those in Fig. 6.

Fig. 4.

Fig. 4.Observed statistical power as a function of the population size using methods requiring Hardy-Weinberg equilibrium (HWE) and linkage equilibrium (LE) for the same data as in Fig. 1 (α = 0.0034, H= 0.025). Decreased statistical power was observed for every type of epistasis effect.


Fig. 5.

Fig. 5.MSE of the epistasis estimations as a function of the population size using methods requiring HWE and LE for the same data with HWD and LD as in Fig. 3 (α = 0.0034, H= 0.025). Decreased accuracy (increased MSE) in estimating epistasis effect was observed for a × d, d × a, and d × d epistasis effect, and such a decrease in accuracy becomes worse as sample size decreases.


Fig. 6.

Fig. 6.Observed statistical power as a function of the population size using methods requiring HWE and LE for detecting epistasis effects, using data with HWD and LD, when d × d epistasis effect is the only genetic effect of phenotypic value (α = 0.0034, H= 0.05). Severe conditioning between d × d and a × a, or a high frequency of false a × a effect, was observed when methods requiring HWE and LE were applied to data with HWD and LD. The confounding problem becomes more serious as sample size increases.


Fig. 7.

Fig. 7.Observed statistical power as a function of the population size using methods requiring HWE and LE for detecting epistasis effects, using data with HWD and LD, when a × d epistasis effect is the only genetic effect of the phenotypic value (α = 0.0034, H= 0.05). Confounding between a × d and d × a, or a high frequency of false d × a effect, was observed when methods requiring HWE and LE were applied to data with HWD and LD. The confounding problem becomes more serious as sample size increases.


Computer implementation.

A computer program, epiSNP, has been developed to implement the epistasis testing method in this article using a two-step least squares analysis similar to that of Wolfinger et al. (23). The first step corrects the phenotypic values to remove nongenetic systematic effects such as gender, treatment, and living conditions, and the second step conducts a significance test for SNP effects using the corrected phenotypic values as the observations. The two-step analysis estimates nongenetic systematic effect only once and offers considerable computational efficiency when the number of SNPs is large. The computer program is freely available at http://animalgene.umn.edu.

DISCUSSION

The pairwise epistasis testing method in this article can be used to assemble interactive SNPs over all autosomes and to assemble all SNP effects affecting a phenotype. The X chromosome can be included for pairwise analysis, but only female individuals can be used. This is because each female has two copies of the X chromosome, so that the analysis of the X chromosome in females is the same as that for autosomes. Similarly, the Z chromosome in birds can be included, but only male individuals can be used in the analysis. The final SNP assembly may contain single gene effects and epistasis effects of SNPs affecting the trait. This pairwise strategy could be used to identify higher order epistasis effects that can be placed in one “epistasis network” analogous to establishing a linkage group, i.e., if locus A interacts with locus B and locus B interacts with locus C, then A, B, and C are within one epistasis network. The pairwise analysis also can be used to assemble all SNP effects as the total SNP effect on a phenotype and is more practical than a higher order analysis. We caution that pairwise analysis does not offer the exact estimate of any higher order epistasis effect. Difficulties associated with a higher order analysis include mathematical complexity and computational difficulty of the method as well as sample size requirement. As the number of loci increases, the mathematical tediousness of the methodology is expected to grow rapidly but should still be manageable for a small number of loci. Potential computation difficulty is an issue that should be noted. The total number of pairwise tests is n(n − 1)/2, which is about 125 billion for 500,000 SNPs. Testing three loci at a time will increase the number of tests to more than 20,000 trillion. This magnitude of increase in the number of tests could easily reach the point where the numerically intensive computations exceed the capacities of the most advanced computing facilities. Another challenge of higher order analysis is in the sample size requirement. The sample size required to add each locus in the higher order epistasis analysis may need to be increased substantially to ensure that each subclass has a sufficient number of observations.

GRANTS

This research was supported in part by funding from the National Research Initiative Competitive Grants Program/United States Department of Agriculture (grant no. 03275) and the Agriculture Experimental Station of the University of Minnesota (project MN-16-043).

We thank an anonymous reviewer for the thorough reading of the manuscript and thoughtful suggestions that improved the manuscript.

REFERENCES

  • 1 Affymetrix. Microarray Bulletin. April 2005.
    Google Scholar
  • 2 Cambien F, Poirier O, Nicaud V, Herrmann SM, Mallet C, Ricard S, Behague I, Hallet V, Blanc H, Loukaci V, Thillet J, Evans A, Ruidavets JB, Arveiler D, Luc G, Tiret L. Sequence diversity in 36 candidate genes for cardiovascular disorders. Am J Hum Genet 65: 183–191, 1999.
    Crossref | PubMed | ISI | Google Scholar
  • 3 Carlborg O, Haley CS. Epistasis: too often neglected in complex trait studies? Nat Rev Genet 5: 618–625, 2004.
    Crossref | PubMed | ISI | Google Scholar
  • 4 Cheverud JM. Detecting epistasis among quantitative trait loci. In: Epistasis and the Evolutionary Process, edited by Wolf J, Brodie E III, and Wade M. New York: Oxford Univ. Press, 2000, p. 58–81.
    Google Scholar
  • 5 Cockerham CC. An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 859–882: 1954.
    PubMed | ISI | Google Scholar
  • 6 Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Trans Roy Soc Edinburgh 52: 399–433, 1918.
    Google Scholar
  • 7 Freudenberg-Hua Y, Freudenberg J, Kluck N, Cichon S, Propping P, Nothen MM. Single nucleotide variation analysis in 65 candidate genes for CNS disorders in a representative sample of the European population. Genome Res 13: 2271–2276, 2003.
    Crossref | PubMed | ISI | Google Scholar
  • 8 Haga H, Yamada R, Ohnishi Y, Nakamura Y, Tanaka T. Gene-based SNP discovery as part of the Japanese Millennium Genome Project: identification of 190,562 genetic variations in the human genome. Single-nucleotide polymorphism. J Hum Genet 47: 605–610, 2002.
    Crossref | PubMed | ISI | Google Scholar
  • 9 Kao CH, Zeng ZB. Modeling epistasis of quantitative trait loci using Cockerham's model. Genetics 160: 1243–1261, 2002.
    PubMed | ISI | Google Scholar
  • 10 Kempthorne O. The correlation between relatives in a random mating population. Proc R Soc Lond B Biol Sci 143: 102–113, 1954.
    PubMed | Google Scholar
  • 11 Lander E, Kruglyak L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet 11: 241–247, 1995.
    Crossref | PubMed | ISI | Google Scholar
  • 12 Li W, Reich J. A complete enumeration and classification of two-locus disease models. Hum Hered 50: 334–349, 2000.
    Crossref | PubMed | ISI | Google Scholar
  • 13 London N, Da Y. Statistical power and sample size requirement for QTL detection. In: Proceedings of the 8th World Congress on Genetics Applied to Livestock Production, edited by Valente BD, Rossi de Morais O, and Ventura RV. Belo Horizonte, Brazil: Instituto Pruciência, 2006, vol. 22, article no. 18.
    Google Scholar
  • 14 Mao Y, Da Y. Statistical power and sample size for QTL detection using flanking markers under the f-2 design. In: Proceedings of the 8th World Congress on Genetics Applied to Livestock Production, edited by Valente BD, Rossi de Morais O, and Ventura RV. Belo Horizonte, Brazil: Instituto Pruciência, 2006, vol. 23, article no. 19.
    Google Scholar
  • 15 Mao Y, Da Y. Statistical power for detecting epistasis QTL effects under the F-2 design. Genet Sel Evol 37: 129–150, 2005.
    Crossref | PubMed | ISI | Google Scholar
  • 16 Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 56: 73–82, 2003.
    Crossref | PubMed | ISI | Google Scholar
  • 17 Purcell S, Sham PC. Epistasis in quantitative trait locus linkage analysis: interaction or main effect? Behav Genet 34: 143–152, 2004.
    Crossref | PubMed | ISI | Google Scholar
  • 18 Searle SR. Linear Models. New York: John Wiley & Sons, 1997, p. 112, 200.
    Google Scholar
  • 19 Segre D, Deluna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nat Genet 37: 77–83, 2004.
    PubMed | ISI | Google Scholar
  • 20 Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE, Jiang R, Messer CJ, Chew A, Han JH, Duan J, Carr JL, Lee MS, Koshy B, Kumar AM, Zhang G, Newell WR, Windemuth A, Xu C, Kalbfleisch TS, Shaner SL, Arnold K, Schulz V, Drysdale CM, Nandabalan K, Judson RS, Ruano G, Vovis GF. Haplotype variation and linkage disequilibrium in 313 human genes. Science 293: 489–493, 2001.
    Crossref | PubMed | ISI | Google Scholar
  • 21 Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Menard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, Sdicu AM, Shapiro J, Sheikh B, Suter B, Wong SL, Zhang LV, Zhu H, Burd CG, Munro S, Sander C, Rine J, Greenblatt J, Peter M, Bretscher A, Bell G, Roth FP, Brown GW, Andrews B, Bussey H, Boone C. Global mapping of the yeast genetic interaction network. Science 303: 808–813, 2004.
    Crossref | PubMed | ISI | Google Scholar
  • 22 Weir BS. Genetic Data Analysis II. Sunderland, MA: Sinauer Associates, 1996.
    Google Scholar
  • 23 Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS. Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol 8: 625–637, 2001.
    Crossref | PubMed | ISI | Google Scholar
  • 24 Zhu X, Yan D, Cooper RS, Luke A, Ikeda MA, Chang YP, Weder A, Chakravarti A. Linkage disequilibrium and haplotype diversity in the genes of the renin-angiotensin system: findings from the family blood pressure program. Genome Res 13: 173–181, 2003.
    Crossref | PubMed | ISI | Google Scholar

Supplemental data