## Abstract

We develop expressions for the power to detect associations between parental genotypes and offspring phenotypes for quantitative traits. Three different “indirect” experimental designs are considered: full-sib, half-sib, and full-sib–half-sib families. We compare the power of these designs to detect genotype–phenotype associations relative to the common, “direct,” approach of genotyping and phenotyping the same individuals. When heritability is low, the indirect designs can outperform the direct method. However, the extra power comes at a cost due to an increased phenotyping effort. By developing expressions for optimal experimental designs given the cost of phenotyping relative to genotyping, we show how the extra costs associated with phenotyping a large number of individuals will influence experimental design decisions. Our results suggest that indirect association studies can be a powerful means of detecting allelic associations in outbred populations of species for which genotyping and phenotyping the same individuals is impractical and for life history and behavioral traits that are heavily influenced by environmental variance and therefore best measured on groups of individuals. Indirect association studies are likely to be favored only on purely economical grounds, however, when phenotyping is substantially less expensive than genotyping. A web-based application implementing our expressions has been developed to aid in the design of indirect association studies.

PREDICTIONS of the evolutionary dynamics of quantitative traits are highly sensitive to the genetic architecture assumed to underlie them (Barton and Turelli 1989). Accordingly, in recent times, a large amount of empirical attention has been paid to the dissection of quantitative traits at the genomic level (Barton and Keightley 2002). Classical methods for trait dissection such as quantitative trait locus (QTL) analysis by linkage rely upon linkage disequilibrium between marker loci and functional polymorphisms within pedigrees and, although useful for determining whether a particular genomic region may affect a trait, suffer undesirable properties such as narrow sampling of naturally occurring allelic variation and a limited scope for distinguishing physical linkage from pleiotropy (Mackay 2001). As these weaknesses all concern critical aspects of genetic architecture (Hansen 2006), they must be overcome if we are to better understand the genetic architecture of quantitative traits. A common alternative to analysis via linkage is to test for direct association between single-nucleotide polymorphisms (SNPs) in candidate gene regions (or the entire genome) and quantitative traits. For example, when linkage disequilibrium is weak, an association study may be more useful for distinguishing linkage from pleiotropy (*e.g*., Carbone *et al*. 2006).

Typically, association studies are conducted on a sample of individuals that are both phenotyped and genotyped (Long and Langley 1999); we refer to this as the *direct* approach. Although requiring large sample sizes, direct approaches have identified candidate polymorphisms contributing to quantitative trait variation in a wide range of organisms including humans (Lettre *et al*. 2008; Weedon *et al*. 2008) and in model organisms such as Drosophila (Dworkin *et al*. 2003, 2005; Long *et al*. 1998, 2000; Robin *et al*. 2002), Arabidopsis (Aranzana *et al*. 2005), and mice (Liu *et al*. 2006). However, independent replication of significant results appears vital to eliminate false positives and population-specific effects (*e.g*., Macdonald and Long 2004; Gruber *et al*. 2007). The direct approach is suitable when a candidate gene's contribution to naturally occurring phenotypic variance is of interest (via the use of field-sampled individuals) and for traits that can be measured with minimal error, such as morphology. However, many fitness-related traits such as behavioral and life-history traits can be heavily influenced by environmental variance (Houle 1992), making them difficult to measure on single field-caught individuals. Such traits are often impossible to measure under field conditions and are more reliably measured on groups of individuals.

For some organisms, performing association studies in a panel of inbred lines (ILs) or doubled haploid lines (DHLs) can be one solution to the requirement of phenotyping multiple individuals and or traits per genotype (*e.g*., Dworkin *et al*. 2003). The IL approach has some disadvantages: the genetic structure of an IL panel does not accurately reflect that of an outbred population, the variance of quantitative traits is altered by inbreeding (Hill and Caballero 1992; Van Buskirk and Willi 2006), and for many species inbreeding is impractical. Clearly, flexible approaches are required that can be implemented in outbred natural populations.

One potential solution is an *indirect* approach whereby a set of parents are genotyped and multiple progeny are phenotyped. Associations are then tested between parental genotype and progeny mean phenotype. This approach was recently used to detect associations in natural populations of Drosophila (Weeks *et al*. 2002; Kennington *et al*. 2007; Rako *et al*. 2007). Indirect methods have also been suggested and applied before in the case of linkage analysis, *i.e*., the association between phenotypes and identity-by-descent (IBD) within pedigrees, in particular in livestock (Weller *et al*. 1990; Van Der Beek *et al*. 1995) and experimental populations (Hill 1998). The motivations for indirect approaches in linkage studies are similar to those for association studies. First, it may be impossible to measure the phenotype or genotype in certain individuals, for example, for sex-limited traits or when it is necessary to kill individuals to be genotyped before their phenotype can be measured. Second, due to the increased sample size per genotype, measuring phenotypes on relatives can increase the precision of estimating QTL effects, thereby increasing the power of detection. However, the extra precision can come at a cost due to incomplete linkage between marker and QTL (Van Der Beek *et al*. 1995). Moreover, linkage analyses require very large sample sizes, lack power to detect QTL that explain only a few percent of population variance, and have poor map resolution (Visscher and Goddard 2004).

While the practical utility of indirect approaches is obvious, within the context of association studies, what remains less clear is the cost one pays in statistical power by indirectly associating parental genotypes with offspring phenotypes. Past investigations of the power of indirect methods have focused on the case of linkage rather than association (Weller *et al*. 1990; Van Der Beek *et al*. 1995). Here, we derive analytical expressions for the power to detect associations between candidate polymorphisms and quantitative traits, in four breeding designs that involve genotyping different configurations of parents and phenotyping their offspring, comparing the performance of each with the direct approach of genotyping and phenotyping the same set of individuals. We then derive expressions for the optimal experimental designs given the relative costs of genotyping and phenotyping. Our results demonstrate that these designs can provide levels of power equal to, and in some cases better than, the direct approach, indicating that indirect association may be a potentially powerful tool for detecting quantitative trait nucleotides (QTNs) in outbred populations.

## MATERIALS AND METHODS

We assume random mating, no segregation distortion, and a general model of gene action that may or may not include dominance. For the direct approach, we consider a number of unrelated individuals with a genotype and a phenotype. For the indirect approach, we consider a balanced design of *k* dams per sire and *m* progeny per dam. We assume a phenotypic standard deviation of unity and that a proportion, *q*^{2}, of the phenotypic variance is due to additive genetic effects at the QTL. This quantity is sometimes called the (additive) QTL heritability (*e.g*., Almasy and Blangero 1998). For a biallelic QTL in Hardy–Weinberg equilibrium, with allele frequency of *p* and additive effect of *a* and dominance deviation of *d*, *q*^{2} = 2*p*(1 − *p*)[*a + d*(2*p* − 1)]^{2} (Falconer and Mackay 1996). In practice, the specific values of *a*, *d*, and *p* do not matter but their combination does, because power of detection using an additive model for analysis depends on *q*^{2} (Neimann-Sorensen and Robertson 1961; Lynch and Walsh 1998) . We note that although dominance can be present in this model (*i.e*., *d* ≠ 0), for reasons outlined below (see *Modeling dominance effects*), we model only the power to detect the additive genetic effect at the QTL. For the direct approach, the model for a phenotype is(1)with *g* the QTL and *e* the residual and var(*y*) = var(*g*) + var(*e*) = *q*^{2} + (1 − *q*^{2}). For the indirect approach, the model is(2)with *s* and *m* the sire and dam effects and var(*y*) = *s*^{2} + *m*^{2} + *q*^{2} + (1 − *s*^{2} − *m*^{2} − *q*^{2}) (Falconer and Mackay 1996). The (intraclass) correlation between full-sibs (FS) is and that between half-sibs (HS) is (via extension from Equations 18.33a,b in Lynch and Walsh 1998, p. 573). If all resemblance between relatives is due to additive genetic factors, then and , where *h*^{2} is the narrow sense heritability that includes the effect of the QTL.

#### Power calculations:

We test associations by regressing the individual (direct approach) or family mean phenotype (indirect approach) on the individual or expected genotype, which is coded by an indicator variable (*x*) as the number of “A” alleles for a biallelic locus with alleles “A” and “B.” For individual genotypes (*e.g*., sires), *x* can have the value of 0, 1, or 2. The variance of *x* in the population is 2*p*(1 − *p*), with *p* the frequency of allele A. For the expected mean genotype in the progeny, *x* can have values 0, 0.5, 1, 1.5, and 2 for full-sib families, reflecting the different types of parental matings possible. For example, *E*(*x*) in the progeny is 2 if both parents have genotype AA.

We take an analytical approach to the analysis of power, a key feature of which is that our derivations focus on the noncentrality parameter (NCP) of the test of association rather than statistical power (1 − β) *per se*. The NCP of a particular design can be thought of as the amount of variation attributable to the model treatment effects (Lynch and Walsh 1998), which in our case is the additive effect of the polymorphism. There are several advantages to the NCP-based approach. First, unlike statistical power, the NCP does not depend upon any arbitrary choice of type 1 (α) error threshold, which is often contingent upon the type of study being conducted. Second, NCPs scale in a linear fashion with sample size, whereas power does not, making it far simpler to recalculate power for any variation in sample size without having to recalculate the NCP itself. The calculation of power itself remains straightforward under this approach. Once an NCP has been calculated, it can be used with the desired critical *P-*value to calculate statistical power.

The test statistic for association is the square of a simple *t*-test (*i.e*., an *F*-test). We assume that the sample size is large enough (*N* ≥ 60, Severo and Zelen 1960) so that the test statistic is approximately distributed as a central χ^{2} with 1 d.f. under the null hypothesis of no association. Under the alternative hypothesis, the test statistic is distributed as a noncentral χ^{2} (Searle 1971) with a NCP of λ. If α and β are the type-I and type-II error rates, then the power (= 1 − β) to detect an association is(3)with *Z* a standard normal variate and *z*_{(1−α/2)} the threshold of a normal distribution corresponding to a type-I error rate of α [Pr(*Z* < *z*_{(α)}) = α and Pr(*Z* > *z*_{(1−α)}) = α] (*e.g*., Lynch and Walsh 1998, p. 870) The relationship between type-I error rate, power, and the NCP is(4)

##### General expression for the NCP:

When linear regression is used to estimate the effect of the marker on the quantitative trait, *y* = μ + *bx* + *e*, the general form of the NCP is (Lynch and Walsh 1998, p. 881). From standard regression theory, (Kendall and Stuart 1977). Treating the *x* as random, when *N* is large (Visscher and Hopper 2001; Visscher and Duffy 2006). Hence, λ = *Nb*^{2}var(*x*)/var(*e*) = *Nb*^{2}var(*x*)/[var(*y*) − *b*^{2}var(*x*)], since *b*^{2}var(*x*) is the variance removed by the regression. In our regression on the genotype indicator variable, this quantity is ∼*cq*^{2}, with *c* the proportion of the QTL variance detected for a given design. A general expression of the NCP across our designs is then(5)For example, for the direct approach, var(*y*) = 1 and *c* = 1. For the indirect approach, both *c* and var(*y*) < 1 and their ratio determines the efficiency of the design. If *cq*^{2} is small (*i.e*., assuming that 0 < *q*^{2} < ∼0.05) relative to var(*y*), then λ ≈ *Ncq*^{2}/var(*y*).

#### Direct method:

##### Individual genotypes and individual phenotypes:

The model for detecting an association is simply *y* = μ + *x* + *e*, with *x* the indicator variable for the genotype. If there are 2*N* individuals phenotyped and genotyped, then(6)This is a standard expression for the NCP of association between a SNP and a quantitative trait (Lynch and Walsh 1998). The NCP per genotyped individual is ∼*q*^{2}. This is also the NCP per phenotyped individual.

#### Indirect designs:

##### Full-sib families (FS)–sire and dam genotyped and full-sib progeny phenotyped:

The model for the family mean (*Y*) is *Y* = μ + *b* *E*(*x*_{o}) + *e*_{n}, with *E*(*x*_{o}) the expected genotype indicator variable in the progeny. The variance of the family mean is var(*Y*) = [(1 − *t*_{FS})/*m* + *t*_{FS}], with *t*_{FS} the intraclass correlation of full-sibs (appendix a, Equation A2). *E*(*x*_{o}) is simply the average of the parents,(7)The regression of *Y* on *x*_{0} is *b* = *a*, and therefore *c*, the proportion of QTL variance detected is and var(*y*) is [(1 − *t*_{FS})/*m* + *t*_{FS}]. The NCP for the test of association is, per full-sib family,(8)The NCP per genotyped individual is . The NCP per phenotyped individual is . For the limiting case of *m* = 1, the NCP per genotyped individual is one-quarter that of the direct approach and the NCP per phenotyped individual is one-half that of the direct approach because the ratio of genotypes to phenotypes is 2.

##### Full- and half-sib families (HS and FSHS1 designs)—sires genotyped, full- and half-sib offspring phenotyped:

If a sire is mated to *k* dams and each dam has *m* progeny (*i.e*., *n* = *km*), then the variance of the progeny average phenotype is (appendix a, Equation A1)(9)Only one-quarter of the QTL variance is detected by using the expected genotype in the progeny, so . The noncentrality parameter per sire family (and per genotyped individual) is(10)and the NCP of an experiment with *N* sires is *N*λ_{sire}. A special case of Equation 10 is the classical HS design in which sires are mated to *k* dams with a single progeny per dam (*m* = 1 and *k* = *n*). The NCP per genotyped sire for this design is(11)The expression for the NCP in the full-sib and the half-sib case is very similar, the difference being the value of *c* (= for the full-sib design and for the half-sib design) and the variance of the family mean.

##### Full- and half-sib families (FSHS2 design)—sires and dams genotyped, full- and half-sib offspring phenotyped:

When dams are nested within sires and all parents are genotyped, then the contribution of dams and sires to the NCP can be treated separately, taking account of the data structure. For a sire family with *k* dams, the contribution from each of the dams is (from Equation 8 but with )(12)The contribution from the half-sibs is (Equation 10)(13)And the total NCP per sire family is(14)When *k* = 1, the expression is equivalent to that for the full-sib design (Equation 8). When *m* = 1 (and therefore *k* = *n*), *i.e*., a half-sib design but all dams genotyped, the NCP is , the two terms corresponding to the contribution of the half-sib family (Equation 11) and *n* parent-offspring pairs, respectively.

If all dams are genotyped, then there are *k* + 1 genotypes per sire family and *km* phenotypes per family. Hence, the NCPs per genotype and per phenotype are (λ_{sire} + *k*λ_{dams})/(*k* + 1) and [λ_{sire}/(*km*) + λ_{dams}/*m*], respectively.

#### Validation of equations:

We performed a simulation study to verify our analytical results for the indirect design (Equations 8, 10, 11, and 14). Phenotypes for progeny were simulated as *y* = μ + *s* + *m* + *g* + *e*, for a given QTL heritability and a model in which all family resemblance was due to additive genetic factors. Input parameters were the numbers of sires, dams, and progeny, the proportion of variance explained by the QTL, the sire and dam intraclass correlations, the degree of dominance, and the allele frequency at the QTL. Sire and dam genotypes were simulated by sampling alleles from the population, using the binomial distribution (assuming Hardy–Weinberg equilibrium). Genotypes of progeny were simulated by sampling from the parental gametes, assuming Mendelian inheritance. Sire, dam, and residual random effects were simulated from a normal distribution with the appropriate standard deviation. Phenotypic observations on progeny were calculated by summation of the individual-specific terms (*i.e*., sire, dam, QTL, and residual). Data were analyzed using linear regression of the progeny means on SNP genotype. A total of 10,000 replicates were run for many combinations of parameters, and the average test statistic was recorded. The mean test statistic was found to be extremely close to our predictions (results not shown) and therefore simulations were not pursued further.

#### Modeling dominance effects:

Our coding for the indicator variable *x* reflects the expected value of the transmitted allele from the parent and therefore models the additive effects of alleles. An alternative parameterization is to code the expected dosage of the *a* and *d* effects in the progeny, given the observed genotype in the parents and the allele frequency in the population. For example, for parents with genotypes BB, AB, and AA, the expected mean values in the progeny due to the QTL are {−(1 − *p*)*a* + *pd*}, , and {*pa* + (1 − *p*)*d*}, respectively (Falconer and Mackay 1996). However, the coefficients for *a* and *d* are linear combinations of each other, so additive and dominance effects cannot be separated when genotypes are observed on a single parent and progeny are phenotyped, for example, in the case of half-sib designs. Therefore, only an allele substitution effect can be estimated, as coded, for example, by 0, , and 1 for these genotypes. This makes sense because twice the expected value of the mean progeny phenotype is, when deviated from the population mean, the additive breeding value of the parent.

There are therefore qualitative differences in the ability to model dominance effects between the different indirect designs. As it is possible to fit a model parameter for dominance only when all parents are genotyped, we confine our results to the 1-d.f. model that detects only additive effects. A possible consequence of this is that when designing a study, one may make design decisions on the assumption that all genetic effects are additive. Although the loss in additive genetic variance due to dominance is simple to estimate for a given genetic variance due to a QTL, it is important to verify that the indirect designs perform in line with these theoretical expectations when dominance is nonzero. We considered this issue using simulations and results are presented in appendix b.

#### Efficiency of designs:

The direct and indirect designs can differ greatly in the number of genotyped and phenotyped individuals. To compare the efficiency of the different designs when the cost of genotyping and phenotyping varies, we express the power (NCP) as a proportion of the cost of the experiment. If the cost per genotype is 1.0 unit and the relative cost of phenotyping to genotyping is *C*_{p}, then the total cost is(15)with *N*_{g} and *N*_{p} the number of individuals genotyped and phenotyped, respectively. For a given design we can express the cost-corrected noncentrality parameter (CNCP) as the NCP per “QTL heritability” and per dollar. That is,(16)Taking the approximate expressions for the NCP (*i.e*., assuming that 0 < *q*^{2} < ∼0.05), then the *q*^{2} drops out of the equation. The resulting CNCPs are then simple expressions of all these parameters.

For the direct approach,(17)For the classic HS design,(18)

For a given value of the heritability (and therefore *t*_{HS} if all family resemblance is due to additive genetic factors), the optimum value of *n* is(19)Similarly for the FS design,(20)and the optimum value of *m* is(21)For the nested design with only sires genotyped (FSHS1),(22)If *m* is fixed, then the optimal value for *k* is .

## RESULTS

#### Statistical power:

##### Common genotyping effort:

It can be seen from the per genotype and per phenotype approximations (Table 1) that the powers of the indirect approaches relative to that of the direct approach are simple functions of two factors: first, the degree to which phenotypes within families are correlated (*i.e*., the intraclass correlations), which, in the absence of nongenetic and nonadditive genetic causes of family resemblance, is the narrow-sense heritability of the trait, and second, the numbers of progeny phenotyped per family when using an indirect design. There is no difference in the relative power of direct and indirect approaches as the QTN effect size varies. Thus, we explore performance of the indirect approaches due to variation in the narrow-sense heritability and progeny phenotyping effort, for an arbitrary QTN effect size while holding genotyping effort constant.

The power for three indirect designs (FS, HS, and FSHS1) and that for the corresponding direct design are plotted for different progeny numbers and narrow-sense heritabilities in Figure 1. For all three designs there are combinations of progeny number and heritability for which the power exceeds that of the corresponding direct design, with the indirect approach performing better as heritability declines (Figure 1, A–C). We investigated the FSHS2 design for a wide range of parameters but found that genotyping both sires and dams is not efficient because the additional information from the progeny of the dams is not compensated by the increase in the number of genotypes. Thus we do not consider this design further.

Due to its effect on the phenotypic correlation among full- and half-sibs, heritability has a strong impact on the power of the indirect designs. While there is little difference in performance of the different indirect designs when heritability is low, all methods suffer reduced power when heritability is high. This is because the variance in progeny mean is relatively large, a factor that is particularly important for the performance of the full-sib design. Equating (6) and (8) for a particular value of *t*_{FS} gives the number of progeny that need to be phenotyped to give equal power of the direct and indirect designs. That number is *m* = 4(1 − *t*_{FS})/(1 − 4*t*_{FS}). Thus, for small values of *t*_{FS}, four progeny per full-sib family need to be phenotyped to have equal power. Relative to the direct approach in which 2*N* individuals are phenotyped this is an increase of a factor of 2. For large values of *t*_{FS} (*t*_{FS} > 0.25, *e.g*., *h*^{2} > ) there is no number of phenotyped progeny that can compensate for the loss of information. Thus, for highly heritable traits there will always be a higher genotyping effort for the full-sib design compared with the direct method. This limitation is not as severe for the half-sib case due to a lower expected correlation between half-sibs over a much wider range of heritabilities.

The FSHS1 design requires a choice of both the number of dams, *k*, and the number of offspring per dam, *m*. For a given value of *km* it appears that there are marginal gains in power by increasing the number of dams rather than increasing the number of offspring per dam. For example, Figure 1C illustrates this point with *km* = 10. With sires mated to five dams and two full-sib progeny phenotyped per dam, power was always higher compared with the alternative situation in which a sire is mated to two dams and five progeny are phenotyped.

##### Common phenotyping effort:

The indirect designs can result in a large variation in the number of phenotyped progeny; thus we also analyzed the performance of the designs on a per phenotype basis. Under no circumstances is power better than that of the direct approach for a common phenotyping effort (Figure 2). The per phenotype NCP of each indirect design tends to plateau well below the corresponding NCP for the direct design. On a per phenotype basis, the full-sib design always performs better than the half-sib designs regardless of whether a nested design is used or not.

#### Relative efficiency and economy:

Because cost is often the major limiting factor in an association study and the indirect designs can lead to a very large phenotyping effort compared with the direct method, we also modeled how the cost of phenotyping relative to genotyping affects the efficiency of different designs. Figure 3 illustrates the CNCPs for the two simplest cases, the full-sib and classic half-sib designs with each optimized for *m* and *n*, respectively, using (21) and (19). The full-sib design provides a much larger contribution to power per unit expenditure when *C*_{p} = 0.1, which represents a 10-fold lower cost of phenotyping relative to genotyping. For all other cases considered, equal costs and when genotyping is 10 times cheaper than phenotyping, the direct design is favored on economical grounds.

## DISCUSSION

We have provided analytical solutions for the power to detect genotype–quantitative trait associations using an indirect approach, in which parents are genotyped and different configurations of offspring are phenotyped in an outbred population. Inspired by the possibility that indirect approaches may represent a flexible and cost-effective strategy for traits that are difficult to measure on single individuals, our goal was to determine whether an indirect approach could provide power equal to or better than the direct approach of genotyping and phenotyping the same individuals. Upon finding that there are regions of parameter space for which the indirect approaches perform well, we explored how the cost of phenotyping relative to genotyping affects the efficiency of the designs.

#### Power considerations:

It is immediately apparent from the per genotype noncentrality parameter expressions and their approximations (Table 1) that the QTN heritability, *q*^{2}, does not affect the power of indirect approaches relative to the direct method. Thus our discussion concerning the performance of different indirect designs has generality across the entire range of effect sizes. However, the power of indirect approaches is heavily influenced by the narrow-sense heritability of the trait, decreasing in all designs as heritability increases. When heritability is high, the contribution of environmental variance to trait values is low. Thus the benefit achieved via the indirect design, due to a decrease in the contribution of environmental variance to family means, is also diminished. By contrast, heritability has the opposite effect on the power of indirect linkage analysis, in full- and half-sib families (Van Der Beek *et al*. 1995). The likely source of this discrepancy is that, for linkage analysis, the power depends on the proportion of within-family variance explained by the QTL, so for a fixed QTL effect size, a larger heritability implies a larger proportion of within-family variance explained and therefore more power (Visscher and Hopper 2001).

Although any “best” design is likely to depend on the reproductive biology of the organism studied, heritability, and budget, four guiding principles emerge from our analyses. First, half-sib designs are more powerful on a per genotype basis than full-sib designs. Second, when using the FSHS1 design, it is always better to increase the number of dams rather than the number of offspring per dam. Because there are more independent half-sib families in this design, the power contribution to the test statistic comes from half- rather than full-sibs (see Equation 14). Third, on a per phenotype basis no indirect approach can match the power of the direct approach. Finally, there may be little to be gained from genotyping both dams and sires in a full-sib–half-sib design. The poor performance of the FSHS2 design is partially a consequence of the fact that, within a group of half-sibs, the genotyping effort allocated to the dams delivers little power gain compared with the alternative of genotyping more independent sires from the population.

One key difference between our approach and other considerations of the power of association studies is that we have assumed that the causal SNP is genotyped. For example, Long and Langley (1999) investigated the power of direct association studies to detect QTL in outbred populations as a function of sample size, the QTL heritability, and recombination rate between the causal and genotyped variants that are in linkage disequilibrium with the causal variant. Our power calculations can be easily adjusted for imperfect linkage disequilibrium by multiplying the noncentrality parameter of the test statistic by *r*^{2}, the squared correlation between alleles at the causal and genotyped variant (Hill and Robertson 1968). In other words, if there is not perfect linkage disequilibrium between the causal and the genotyped variant, then the experimental sample size needs to increase by a factor of 1/*r*^{2} to achieve the same power.

#### Considerations due to dominance:

In our analyses we have fitted an additive model and therefore our power calculations are relevant for the proportion of additive genetic variance due to the QTN. The predictable degree to which parental genotype reflects progeny mean phenotype is altered by dominance. Dominance has been shown to influence the power of tests of association and linkage and in general terms can either increase or decrease power depending upon the magnitude of dominance variance relative to the cost of fitting an extra model parameter to account for it (Sham *et al*. 2000). Although it has been argued from theoretical grounds and empirical observations that dominance effects may be generally relatively weak (Hill *et al*. 2008), in the indirect case, as we have considered here, we model only the ability to detect additive effects, so it is likely that power will be compromised by dominance, an issue we consider below.

We first considered the question of whether the decrease in power as a consequence of dominance is any worse than when using the equivalent direct method (*i.e*., a 1-d.f. model). From theory, power loss due to dominance can be substantial for both designs when dominance is strong (*a = d*) and the common allele is dominant (appendix b). Simulations suggest, however, that the direct and indirect approaches do not differ systematically in their sensitivity to this effect (appendix b). However, one has the option of fitting a model with a specific coding variable for *d* (*i.e*., a 2-d.f. model) when using the direct approach, an option available only for indirect designs in which both parents are genotyped. For the indirect FS design, dominance can be separated from additive effects because the progeny genotype can be predicted, in some cases without error. For example, AA × AA matings always give AA progeny and AA × BB matings always give AB progeny. Progeny from AB × AB matings have the same expectation for *a* as progeny from AA × BB matings but differ in their expectation for *d* ( and 1, respectively). In practice, however, the power to detect dominance will be lower than that of the direct approach for two reasons. First, the variance of the indicator variable for *d* is lower for the indirect case in a manner that depends upon allele frequency. For the direct approach this is Var(*x _{d}*

_{direct}) =

*H*(1 −

*H*) whereas for the indirect full-sib case , where

*H*is the heterozygosity (= 2

*p*(1 −

*p*)). Second, when allele frequencies are nonsymmetrical, the coding variables for

*a*and

*d*become correlated with each other. This has the effect of significantly reducing the amount of dominance that is recoverable after fitting an additive term. Thus, although it is theoretically possible to recover some information on dominance using the full-sib design, it is likely to be small unless moderate-frequency SNPs are being tested. In summary, it appears that the main impact on power between the direct and indirect methods due to dominance will be a function of the limited ability to fit a 2 d.f.-model in some cases rather than any systematic difference between the approaches in their ability to detect additive effects.

#### Relative efficiency and economy:

As power relative to the direct approach is a function of the phenotyping effort, the indirect designs can require considerable phenotyping effort to gain equal or better power than the direct approach. Thus, by developing expressions for the noncentrality parameter per dollar (CNCP) and optimal sizes for the three simplest indirect designs, we were able to investigate how variability in the relative costs of genotyping and phenotyping may influence design choice. Only when phenotyping is cheaper than genotyping (*i.e*., *C*_{p} < 1) is it possible for the indirect approach to be more economical. In fact, often phenotyping may have to be considerably cheaper than genotyping. For the limit of *C*_{p} → 0, *i.e*., very cheap phenotyping relative to genotyping, the comparison between designs is for the same number of genotypes (see Table 1). For the case considered in Figure 3, where heritability is 0.2, CNCP(HS) → 1/*h*^{2} = 5 and CNCP(FS) → 1/2*h*^{2} = 2.5, both for a large (strictly infinite) number of progeny.

#### Conclusion:

We have developed exact expressions and approximations for the statistical power of tests of association, in which parental genotypes are associated with progeny mean phenotypes, and compared their performance with a direct approach in which associations are tested between genotypes and phenotypes from the same individuals. For situations in which both direct and indirect approaches are feasible, our results suggest that the indirect approaches are more powerful when traits have low heritability but are more economical to implement only when genotyping costs far outweigh phenotyping costs. For studies in which the specific trait or study organism precludes a direct association study, the indirect approach nonetheless remains a viable option. We have implemented power calculations for both the direct method and the indirect designs considered here on a web-based application, power calculator for indirect association studies (PIAS), which can be accessed by the wider community (http://www.chenowethlab.org/pias/index.html).

## APPENDIX A: DERIVATION OF VARIANCE AMONG FAMILY MEANS

From Falconer and Mackay (1996, p. 167), the variance of sire means is given byBut assuming *V*_{P} = 1,Hence,(A1)

For the full-sib case, *k* = 1 and there is no contribution from half-sibs; thus, *t*_{HS} = 0. Hence,(A2)

## APPENDIX B: ASSOCIATION WITH DOMINANCE

For a standard quantitative genetic model, with mean values of −*a*, *d*, and *a* for genotypes AA, AB, and BB, using *k* = *d*/*a* as the degree of dominance, *p* as the allele frequency of allele B, and *H* as the heterozygosity (2*p*(1 − *p*)),(B1)(B2)(Falconer and Mackay 1996; Lynch and Walsh 1998). Let the phenotypic variance be 1 and the total proportion of variance due to the QTL be *Q*^{2} = var(*A*) + var(*D*). As before, the proportion of phenotypic variance due to additive variance at the QTL is *q*^{2} = var(*A*). For a direct design, the NCP for an additive (1-d.f.) model is ∼*Nq*^{2}/(1 − *q*^{2}) and the NCP for a 2-d.f. model is ∼*NQ*^{2}/(1 − *Q*^{2}) (*e.g*., Sham *et al*. 2000). Some examples, keeping the total proportion of variance due to the QTL constant (*Q*^{2} = 0.05), are given in Table A1.

Only if there is strong dominance and the common allele is dominant (*e.g*., *p* = 0.1, *k* = −1) is there substantial loss in power by fitting an additive model. If *Q*^{2} is small, then the ratio of the NCP for fitting *a* or fitting *a* + *d* is ∼*q*^{2}/*Q*^{2}. This ratio can be expressed as(B3)and depends on *k* and *p*.

In the indirect model we detect or when we fit an additive model; thus apart from this reduction in the amount of *q*^{2} detected, there is no theoretical reason for expecting the performance of a 1-d.f. test to be different from that of the direct approach. We tested this prediction using simulations.

We simulated a quantitative trait according to a model in which the trait was influenced by both additive and dominance effects. Phenotypes for progeny were simulated as *y* = μ + *s* + *m* + *g* + *e*, for a given QTL heritability and a model in which all family resemblance was due to both additive and dominance effects (given by a value of *k*). Data were analyzed using linear regression of the progeny means on SNP genotype. A total of 10,000 replicates were run for each combination of parameters, and the average test statistic was recorded. We considered both the FS and the classic HS designs with parameter values similar to those in Figure 1, A and B: that is, a sample of 100 genotypes, total QTL variance of 0.05, and trait heritability of 0.4. We considered three allele frequencies (*p* = 0.1, 0.5, and 0.7) for the cases of no (*k* = 0), partial (*k* = 0.5), and complete (*k* = 1) dominance. We then compared predicted (theory) test statistics with simulated values for each set of parameter values. For both designs, simulated results were very similar to theory and in no case did theoretical test statistics significantly exceed simulated ones (Tables A2 and A3).

## Acknowledgments

We thank Bill Hill, Ary Hoffmann, and Emma Hine for comments on the manuscript. S.F.C. is supported by the Australian Research Council, and P.M.V. is supported by the Australian National Health and Medical Research Council.

## Footnotes

Communicating editor: A. D. Long

- Received November 27, 2008.
- Accepted December 9, 2008.

- Copyright © 2009 by the Genetics Society of America