## Abstract

For multiallelic loci, standard measures of linkage disequilibrium provide an incomplete description of the correlation of variation at two loci, especially when there are *different* numbers of alleles at the two loci. We have developed a complementary pair of *conditional asymmetric* linkage disequilibrium (ALD) measures. Since these measures do not assume symmetry, they more accurately describe the correlation between two loci and can identify heterogeneity in genetic variation not captured by other symmetric measures. For biallelic loci the ALD are symmetric and equivalent to the correlation coefficient *r*. The ALD measures are particularly relevant for disease-association studies to identify cases in which an analysis can be stratified by one of more loci. A stratified analysis can aid in detecting primary disease-predisposing genes and additional disease genes in a genetic region. The ALD measures are also informative for detecting selection acting independently on loci in high linkage disequilibrium or on specific amino acids within genes. For SNP data, the ALD statistics provide a measure of linkage disequilibrium on the same scale for comparisons among SNPs, among SNPs and more polymorphic loci, among haplotype blocks of SNPs, and for fine mapping of disease genes. The ALD measures, combined with haplotype-specific homozygosity, will be increasingly useful as next-generation sequencing methods identify additional allelic variation throughout the genome.

- linkage disequilibrium (LD)
- correlation coefficient
*r* - multiallelic LD
*W*_{n} - asymmetric LD (ALD)
*W*and_{A/B}*W*_{B/A}- conditional or stratified analyses

THE definition of the linkage disequilibrium (LD) parameter *D _{ij}* of nonrandom association between a pair of alleles

*A*and

_{i}*B*at two loci (

_{j}*A*and

*B*) is straightforward and unequivocal. It is the difference between the observed (or estimated) haplotype (chromosomal or gametic) frequency (

*f*) and that expected under random association of the two allele frequencies

_{ij}*. While this is the base of all other measures of LD, defining the*

*strength*of any observed nonrandom association is complicated by the fact that the maximum value

*D*can take is a function of the observed allele frequencies. A number of normalized measures to reflect the strength of LD have been proposed; both for

_{ij}*bi*- and

*multi*allelic data (Hedrick 1987; Lewontin 1988). However, since these are all a single summary of multidimensional data, no proposed measure of the strength of LD can be perfect; although each may have strengths and weaknesses with respect to the question being addressed.

The two most common measures of the strength of LD are: (1) the normalized measure of the individual LD values (Lewontin 1964), *D _{ij}*′ =

*D*/

_{ij}*D*

_{max}(see Supporting Information, File S1 for details) and (2) the correlation coefficient

*r*for

*biallelic*data, which is most often reported as

*r*

^{2}=

*D*

_{ij}^{2}/ (

*p*

_{A}_{1}

*p*

_{A}_{2}

*p*

_{B}_{1}

*p*

_{B}_{2}). Hedrick (1987) extended the

*D*′ measure for multiallelic data as a weighted average over all alleles at each locus of the individual normalized LD values:

*D*′

*=*Σ

*Σ*

_{i}*′| . The multiallelic extension of the*

_{j}p_{}p_{}|D_{ij}*r*

^{2}measure iswhere

*k*and

_{A}*k*indicate the number of alleles at each locus. It is also known as Cramer’s

_{B}*V*statistic (Cramer 1946), defined on the contingency table relating two categorical variables and is a reexpression of the χ

^{2}statistic, normalized to be between zero and one (Hill 1975; Hedrick 1987; Single

*et al.*2007, 2011). With

*N*individuals (2

*N*alleles/haplotypes), (

*2N)(W*min(

_{n}^{2})*k*− 1,

_{A}*k*− 1) has a χ

_{B}^{2}distribution with (

*k*– 1)(

_{A}*k*− 1) degrees of freedom and can be used to test for significant LD between two loci.

_{B}For *bialleli*c data, *D*′ = 1 whenever one or more of the four possible haplotypes are *not* observed, irrespective of the expected frequencies. In contrast, *r* directly measures the correlation coefficient of the biallelic variation at two loci. Specifically, *r* = 1 only when the allelic variations at the two loci show 100% correlation, *i.e.*, when both loci have equal allele frequencies and only two complementary haplotypes are observed. This correlation property is of interest to many research questions. For example, if two loci show associations with a disease but *r* is close or equal to one (*i.e.*, nearly complete allelic association), then there is little or no variation that can be assessed by a stratified analysis for risk heterogeneity between two potentially disease-predisposing genetic variants.

Due to these inherent differences between the properties of the *D*′ measure and the correlation measure *r*, we focus on the correlation measure and its multiallelic extension *W _{n}*. We developed the pair of

*conditional asymmetric*LD (ALD) measures,

*W*and

_{A/B}*W*, to complement the

_{B/A}*W*measure especially when there are

_{n}*different*numbers of alleles at the two loci. This leads to cases where

*W*is equal or close to one while one of the two ALD measures is substantially less than one.

_{n}Other conditional LD measures have been proposed (Nei and Li 1980; Chakravarti *et al.* 1984; Hudson 1985; Guo 1997). Nei and Li (1980) developed a statistic that quantifies the association between alleles at a marker locus and a disease locus for studies where individuals are not randomly sampled from a single population, but sampling intensity varies within (disease) categories (Kaplan and Weir 1992; Maiste and Weir 1992). See File S1 for additional detail. In contrast to the above, the ALD measures introduced below are defined for a randomly ascertained sample from a demographically defined population or control group.

When there are *different numbers of alleles* at the two loci, the direct correlation property discussed above for the *r* measure *is not* retained by its multiallelic extension *W _{n}*. Consider

*example 1*with

*two*and

*three*alleles at the first and second loci, with

*f*

_{11}= 0.3,

*f*

_{22}= 0.5,

*f*

_{23}= 0.2 for

*A*

_{1}

*B*

_{1},

*A*

_{2}

*B*

_{2}, and

*A*

_{2}

*B*

_{3}haplotypes.

*W*= 1; however, there is variation at the

_{n}*B*locus on haplotypes containing the

*A*allele. Thus, there is not 100% correlation, and there

_{2}*never*can be with differing numbers of alleles at the two loci. In this example the two ALD measures (defined below) reflect that while there is no variation of

*A*locus alleles on any of the haplotypes conditioned on the

*B*locus alleles (

*W*= 1), there is variation in the

_{A/B}*B*

_{2}and

*B*

_{3}alleles on haplotypes carrying

*A*

_{2}(

*W*= 0.73). The ALD measures directly indicate that with appropriate sample size, stratification analyses could be carried out for certain comparisons. In contrast, a naive interpretation of the fact that

_{B/A}*W*= 1 could result in passing over these data for conditional or stratified haplotype analyses of risk heterogeneity (Thomson

_{n}*et al.*2008).

The definition of the ALD measures begins with the homozygosity (*F*) and heterozygosity (*H*) values expected under Hardy–Weinberg proportions (HWP) at a single locus (see Table 1). While there are other measures of association and LD that are based on allelic diversity statistics (see File S1 for details), these measures are all symmetric (Ohta 1980; Maruyama 1982; Hedrick and Thomson 1986; Hedrick 1987). The composite LD measure of Wu *et al.* (2008) is designed to test interaction between two unlinked loci.

The conditional two-locus extensions of *F* and *H*, called haplotype-specific homozygosity (*HSF*) and haplotype-specific heterozygosity (*HSH*), measure the level of genetic variation at locus A on haplotypes with a specific allele on the B locus (and vice versa), *i.e.*, *F _{A/Bj}*, and

*F*(see Table 1). We developed the

_{B/Ai}*HSF*and

*HSH*measures (Malkki

*et al.*2005) to ascertain informative microsatellites (MSATs) in HLA transplantation and disease studies. The complementary pair of conditional ALD measures are defined by normalizing an extension of the HSF measure across all haplotypes.

## Materials and Methods

### Definition of the asymmetric LD measures

There are two conditional ALD measures, depending on which locus is conditioned upon. For simplicity, we often describe the measure in detail conditioning on the *B* locus. The derivation of the complementary measure, conditioning on the *A* locus, is given by swapping the roles of loci *A* and *B*.

The *individual HSF* values (Table 1) are combined as a weighted average over all alleles at the conditioned locus to obtain the two *overall* haplotype specific homozygosity measures: *F _{A/B}* and

*F*(Table 1 and see

_{B/A}*Appendix*for alternate expressions). The maximum value

*F*can take is 1.0, when each

_{A/B}*A*allele occurs with only one

*B*allele.

*W _{A/B}*

^{2}(the square of the ALD measure) is obtained by normalizing the

*overall weighted HSF*value based on the range of possible values that it can achieve (Table 1):For biallelic data at both loci (see

*Appendix*).

Once we deviate from having two alleles at both loci, the two ALD measures are only equal in certain specific cases (see below). For biallelic data the correlation coefficient is given by *r*; for multiallelic data *W _{n}* and the ALD measures,

*W*and

_{A/B}*W*, give the appropriate correlation coefficients.

_{B/A}Other factors being equal, the ALD increases with stronger LD between the two loci. The ALD values are also influenced by the number of alleles at each locus. Specifically, for multiallelic loci with *unequal* numbers of alleles, *e.g.*, *k _{A}* <

*k*(with

_{B}*k*≥ 2), in the extreme case each

_{A}*B*allele will occur with only one

_{j}*A*allele and

_{i}*W*= 1 (indicating no variation at the A locus on any haplotype containing a specific

_{A/B}*B*allele) and also

_{j}*W*= 1 (mirroring this effect). However,

_{n}*W*< 1 reflects the required variation, given the inequality of allele numbers, at the B locus on some or all haplotypes containing a specific

_{B/A}*A*allele (see special case e, below).

_{i}### Special cases

*Biallelic*loci with*two*haplotypes of the four possible,*e.g.*,*A*_{1}*B*_{1}and*A*_{2}*B*_{2}, (hence ). LD is maximal with*D*=*, and there is symmetry in all measures:**D*′ = 1 and*r*=*W*=_{n}*W*=_{A/B}*W*= 1._{B/A}*Biallelic*loci with*three*haplotypes of the four possible,*e.g.*,*A*_{1}*B*_{1},*A*_{1}*B*_{2}, and*A*_{2}*B*_{2}. With the following allele frequencies , LD is maximal (*D*= ):*D*′ = 1, but*r*(=*W*=_{n}*W*=_{A/B}*W*) < 1. This reflects that the allele frequencies at the two loci are not 100% correlated._{B/A}Multiallelic loci with

*equal*number of alleles (*i.e.*,*k*=_{A}*k*=_{B}*k*) and only symmetric haplotypes (*i.e.*,*f*> 0, for all_{ii}*i*= 1, 2, …,*k*, and*f*= 0 otherwise). As above for the biallelic case a, there is complete symmetry and 100% correlation of allele frequencies at the two loci:_{ij}*D*′ = 1, and*W*=_{n}*W*=_{A/B}*W*= 1. An example with three alleles at both loci is_{B/A}*f*_{11}= 0.5,*f*_{22}= 0.3,*f*_{33}= 0.2, with all other*f*= 0. There is no variation of A locus alleles on any of the haplotypes conditioned on the B locus alleles, and vice versa._{ij}The same as c above, except that one or more of

*f*> 0 for_{ij}*i ≠ j*:*W*< 1,_{n}*W*<1,_{A/B}*W*< 1._{B/A}Multiallelic loci with

*unequal*number of alleles (*e.g.*,*k*<_{A}*k*), with each_{B}*B*allele occurring with only one_{j}*A*allele (see example 1 in the Introduction). While_{i}*W*=_{n}*W*= 1,_{A/B}*W*< 1._{B/A}One locus biallelic and the other multiallelic (

*e.g.*,*k*= 2,_{A}*k*> 2):_{B}*W*=_{n}*W*. In a variety of cases examined,_{A/B}≠ W_{B/A}*W*<_{B/A}*W*, but we have no proof that this is always the case._{A/B}

See File S1 for proofs of special cases c–f.

## Results

### HLA classical loci

We applied the ALD measures to data for the polymorphic HLA classical genes (Wilson 2010): class I (*A*, *C*, and *B*) and class II *(DRB1*, *DQA1*, *DQB1*, and *DPB1*). Figure 1 and Figure 2 respectively show the standard overall LD measure *W _{n}* and the ALD measures

*W*and

_{A/B}*W*. The

_{B/A}*W*measure assumes/forces symmetry (as does the overall

_{n}*D*′ measure, not shown) even though with more than two alleles per locus, differing numbers of alleles at each locus, and different levels of LD between loci this is not the case.

The ALD values show considerable heterogeneity. For example (with numbers of alleles for each locus given in parentheses), the *ALD* for *DRB1*(40) conditioning on *DQA1*(9) is 0.58 = *W _{DRB1/DQA1}*;

*i.e.*, the overall variation for

*DRB1*is relatively high given specific

*DQA1*alleles. In contrast, the

*ALD*for

*DQA1*conditioning on

*DRB1*is 0.95 =

*W*;

_{DQA1/DRB1}*i.e.*, the overall variation for

*DQA1*is relatively low given specific

*DRB1*alleles. This reflects both the smaller number of alleles at

*DQA1*compared to

*DRB1*and the high LD between the two loci (most

*DRB1*alleles occur with only one

*DQA1*allele, but not vice versa). Similarly with the

*B*(61) and

*C*(29) loci,

*W*= 0.65, and

_{B/C}*W*= 0.84. In both these examples the standard (symmetric) overall pairwise LD values are intermediate to the ALD values:

_{C/B}*W*= 0.87 and 0.73 for the

_{n}*DRB1*:

*DQA1*and

*C*:

*B*locus pairs, respectively. In almost all comparisons, if the number of alleles

*k*>

_{X}*k*then

_{Y}*W*<

_{X/Y}*W*. An exception is with the

_{Y/X}*A*(33) and

*C*(29) loci,

*i.e.*,

*k*>

_{A}*k*, but

_{C}*W*(0.41) >

_{A/C}*W*(0.40).

_{C/A}### SNP and HLA data

HLA and SNP data from de Bakker *et al.* (2006) characterized patterns of LD among highly polymorphic HLA genes and a large number of SNP sites. The extensive LD across the extended HLA region (∼8 Mb) makes the identification of additional non-HLA genomic effects on disease difficult to assess. The SNP sites used here were selected on the basis of their ability to identify or tag specific alleles at each of the HLA classical loci (*i.e.*, tag-SNPs for HLA alleles). We chose this example, with a subset of the HLA and SNP data in the class II region, to highlight the properties of the ALD measures and what distinguishes them from the symmetric *r* and *W _{n}* measures.

Figure 3 and Figure 4 show plots of the *W _{n}* and ALD measures for 90 unrelated individuals with European ancestry from the Centre d’Etude du Polymorphisme Humain (CEPH) collection (CEU) obtained from the Tagger/MHC webpage. The ALD measures (Figure 4) provide a visualization of the tag-SNP properties that is not captured by the symmetric

*W*measure. Looking down the column for any one of the HLA loci (

_{n}*i.e.*, conditioning on an HLA locus), one can see the particular SNPs that tag specific HLA alleles. These show up as a dark column in the figure. However, conditioning on any given SNP does not show this pattern of high LD. In contrast with the figure for

*W*, there are no dark

_{n}*rows*of high LD for the ALD measures, indicating that the ALD measures capture the different degree of overall association for each individual SNP.

Note that the information displayed in Figure 3 and Figure 4 captures different aspects of LD from the results reported in the de Bakker *et al.* (2006) article, as we present *overall* LD between each pair of loci. The *r ^{2}* values reported in their article represent the squared correlation between a given SNP and presence/absence of each particular HLA allele (

*e.g.*, A*0101

*vs.*other). The tag-SNPs were chosen such that this

*r*value is 1.0 (or nearly so) for a specific HLA allele, not for the overall locus. The values in Figure 3 and Figure 4 represent

^{2}*overall*LD combining over all alleles at both loci.

For example, the SNP rs4988889 is listed as a tag-SNP in the CEU population for the HLA-DQB1*02:01 *allele* in Table S3 of de Bakker *et al.* (2006), with an *r ^{2}* (symmetric) value of 0.958. It does not show up as a tag-SNP for any other HLA allele in their Table S3. In Table 2 below, one can see that the values for

*W*

_{HLA|SNP}and

*W*

_{SNP|HLA}are quite different (0.4083

*vs.*0.9788). The rs7743506 SNP is listed in de Bakker

*et al.*(2006) as a tag-SNP for three class II alleles, each with an

*r*value of 1.0: HLA-

^{2}*DQA1**04:01, HLA-

*DQB1**04:02, and HLA-

*DRB1**08:01. Thus, allele 2 for this SNP is completely correlated with the presence of each of these three class II HLA alleles. This 100% correlation is captured by the ALD measure (

*W*

_{SNP|HLA}= 1.0), while the low values for

*W*

_{HLA|SNP}for each of the three class II loci indicates that there is a large amount of variability remaining at the HLA loci after conditioning on this SNP. Note that for the examples in Table 2,

*W*

_{SNP|HLA}is equal to

*W*. This is an example of special case f above.

_{n}### HLA disease association data

The HLA class II *DRB1* gene is strongly associated with juvenile idiopathic arthritis (oligoarticular-persistent) (JIA-OP), with a hierarchy of predisposing through intermediate (“neutral”) to protective effects (Hollenbach *et al.* 2010; Thomson *et al.* 2010). Amino-acid position 13 (AA13) of *DRB1* shows the strongest single AA association with JIA-OP. This association is also stronger than other potentially biologically relevant combinations of AAs defined under sequence feature variant-type (SFVT) analysis (Karp *et al.* 2010; Thomson *et al.* 2010). AA13 is also identified as potentially causative in disease using an extension of Salamon’s unique combinations algorithm (Salamon *et al.* 1996; Thomson *et al.* 2010). The overall AA LD (*W _{n}*) patterns are quite complex for each of the classical HLA loci, with

*DRB1*control data for JIA-OP shown in Figure 5. AA13 shows high LD via the

*W*measure with quite a few other AAs (note only AAs 9–38 within exon 2 are shown). However, ALD analyses show additional variation that can be tested via conditional analyses (Figure 6).

_{n}For illustration, we consider the block of high LD AAs 11(6), 12(2), and 13(6) (the number of “alleles,” or different AA residues segregating, at each AA site are given in parentheses). AA 10(2) and AA 12 are 100% correlated apart from a very rare allele, and hence AA 10 is not considered here. The ALD values indicate which pairs of AAs may allow for stratification and conditional analyses. For example (see Figure 5 and Figure 6), with AAs 11 and 12, *W _{n}* = 1, and while

*W*

_{12/11}= 1,

*W*

_{11/12}= 0.64, and hence some stratification analyses can be carried out (this is also an illustration of special case f above). Table 3 shows the results of specific tests of risk heterogeneity: variation at AA 13 is significantly associated with disease on haplotypes with AA 11 and AA 12. In contrast, AA 11 does not show heterogeneity on haplotypes with AA 13. This does not exclude a role for AA 11, nor AA 12, in disease predisposition, but the conditional analyses do show a potential role for AA 13 in being directly involved in disease risk.

### Selection on HLA–DRB1 amino acids

A role for balancing selection maintaining much of the extensive variation at the HLA classical loci is well established (Meyer and Thomson 2001; Meyer *et al.* 2006). In particular, application of the Ewens–Watterson (EW) neutrality test of allele-frequency distributions at the classical HLA loci has revealed the action of balancing selection in maintaining diversity at the HLA-*A*, -*C*, -*B*, *DRB1*, *DQA1*, and *DQB1* loci (Salamon *et al.* 1999; Lancaster 2006; Solberg *et al.* 2008). Allele frequency distributions at these loci are generally more even than expected under neutral conditions. The distributions of *DPB1* alleles do not show evidence of balancing selection (Salamon *et al.* 1999; Begovich *et al.* 2001; Lancaster 2006; Tsai and Thomson 2007; Solberg *et al.* 2008). However, extension of the EW test to the AA level has shown evidence for balancing selection acting on some AAs for all the classical HLA loci, including *DPB1* (Salamon *et al.* 1999; Valdes *et al.* 1999; Lancaster 2006).

At both the allele and AA levels, the statistic used for the above analyses is the mean across populations of the normalized deviate *F*_{nd} of the homozygosity statistic *F* (Salamon *et al.* 1999). Balancing selection results in significantly negative *F*_{nd} values compared to neutral expectations, whereas directional selection, along with certain demographic events, leads to significant positive values. An observation of interest from previous studies is that pairs of AAs that show high LD may nonetheless show quite different *F*_{nd} values (Salamon *et al.* 1999; Lancaster 2006). To illustrate this point in the context of ALD measures applied to the JIA-OP *DRB1* control data, consider AA positions 37 and 38, which have a moderately high *W _{n}* value of 0.71 (Figure 5). However, the ALD values are quite disparate (

*W*

_{37/38}= 0.18 and

*W*

_{38/37}= 0.82) (Figure 6), and explain how the observed

*F*

_{nd}values can show different evolutionary histories with significant evidence for balancing selection for AA 37 and possible directional selection for AA 38 (Figure 7). This pattern is not unique to this particular population. Similar patterns of this differential selection can be seen in meta-analyses across several populations (see Figure S1 for

*F*

_{nd}values across 57 populations for DRB1 data (Lancaster 2006)). For these data,

*P*-values for deviation from neutral expectations in the direction of balancing selection are 2.5

*E*−24 and 0.11 for AAs 37 and 38, respectively (Lancaster 2006).

## Discussion

From analyses of allele and haplotype data in disease-association studies, HLA researchers have long recognized that high pairwise LD (*W _{n}*) between two loci has limited our ability in some cases to distinguish the primary disease gene or genes. It is also well known that there are instances, particularly with differing numbers of alleles at two loci, where the

*W*value does not accurately reflect our ability to perform stratified or conditional analyses to identify disease-risk heterogeneity. With multiallelic data, the ALD measures presented here are more appropriate and informative than the

_{n}*W*measure. For example, with type 1 diabetes (T1D),

_{n}*DRB1–DQB1*haplotypes carrying the

*DRB1**04:01 allele can be subdivided by the

*DQB1**03:02 (predisposing) and *03:01 (protective) alleles. This approach, termed for HLA studies “within serogroup comparisons” (based on a specific variant in the first field, or serotype, of the

*DQB1*allele name, and comparing AA variation related to disease risk in the second field) focuses on a smaller number of AAs to compare. In this case the analysis of

*DRB1–DQB1*haplotypes is stratified on

*DQB1*based on the presence of

*DRB1**04:01. This led to identification of AA 57 of

*DQB1*in T1D risk. In fact, for T1D both

*DRB1*and

*DQB1*are directly involved in disease risk, with confirmation coming from cross-ethnic studies (Thomson

*et al.*2007, 2008, 2011; Erlich

*et al.*2008).

Another example of stratification on a particular site aiding in the identification of additional effects comes from a SNP in the *PTPN22* gene. In a study of rheumatoid arthritis, Begovich *et al.* (2004) demonstrated an association with the minor allele of the R620W missense SNP (*rs2476601*) in *PTPN22*. In a follow-up study, similar to the above HLA study on T1D, Carlton *et al.* (2005) used AA analyses of closely related haplotypes of SNPs to show a direct role of R620W in risk heterogeneity. With stratification of the data by R620W, the role in disease risk of at least one additional SNP in *PTPN22* was identified.

The ALD measures were initially developed to aid two separate lines of research for AA variation at classical HLA genes: to determine the actual disease-predisposing AAs in disease-association studies and to identify which AA sites are independently subject to selection in population studies. The major problem encountered in both research areas is the high level and complex patterns of LD between many AA sites, combined with more than two (and up to six) distinct AAs (“alleles”), seen at many sites. When evidence of strong balancing selection is seen at a number of AA sites (Salamon *et al.* 1999; Valdes *et al.* 1999; Lancaster 2006), how does one determine which AA sites could potentially show independent evolution *vs.* correlation due to high LD? Similarly with disease-association studies of individual AAs and biologically relevant sequence features (SFs) and their variant types (VTs) (Karp *et al.* 2010; Thomson *et al.* 2010), how can one distinguish between potentially causal effects *vs.* those due to LD? These AA-level analyses showed that there are cases with different numbers of “alleles” (AAs or SFVTs) at two loci where *W _{n}* = 1; nonetheless a stratified analysis could be applied to potentially distinguish disease predisposing variants. Also in population studies there are cases of two AA sites with

*W*≈ 1, which show variation that appears to be under different selection pressures (Salamon

_{n}*et al.*1999; Lancaster 2006). The ALD measures can help provide additional insight in these situations.

The ALD measures are applicable to the study of any genetic variation, and the fact that they are measured on the same scale as the well-documented correlation measure *r* enhances their comparability and interpretation. They will be increasingly useful as next-generation sequencing methods identify more allelic variation, including nonbiallelic SNPs, insertion/deletion polymorphisms, and copy-number variants. Currently, these nonbiallelic SNP sites are often excluded from analyses. Linkage disequilibrium analyses among SNPs and among polymorphic genes are typically handled separately and polymorphic genes are often recoded as a set of dichotomous indicator variables (presence/absence of each allele) to simplify analyses at the expense of interpretation. The ALD statistics provide a measure of linkage disequilibrium that is on the same scale for comparisons among SNPs, among SNPs and more polymorphic loci, among haplotype blocks of SNPs, and for fine mapping of disease genes. The ALD measures are especially useful when there is asymmetry in the number of alleles at each locus, and it is suspected that even with very high *W _{n}* values, some haplotypes will allow for a stratified analysis. The ALD values, combined with the HSF values (Table 1), give us a numeric evaluation of the variation available for stratification analyses. It can be challenging to conduct several analyses, synthesizing results from various combinations and types of genetic variants as risk factors. The ALD measures form a base for such studies, along with consideration of other complementary summary measures of the strength and structure of LD in multiallelic data.

## Acknowledgments

We thank Diogo Meyer, Montgomery Slatkin, and two anonymous reviewers for their helpful comments. We also thank Alex Lancaster for the use of his thesis data. This work was supported in part by National Institutes of Health (NIH) Contract HHSN272201200028C (G.T. and R.M.S.), NIH grant MH096262 (G.T.), and a 2013-14 REACH grant from the University of Vermont (R.M.S.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Data used in this paper can be found at the tagger/MHC webpage (http://www.broadinstitute.org/mpg/tagger/mhc.html) and at the Immunology Database and Analysis Portal (ImmPort - immport.niaid.nih.gov - SDY26 and SDY313).

## Appendix: Alternate Expressions for *ALD* Statistics

#### Alternate expressions for *F*_{A/B} and *F*_{B/A} for multiallelic data

_{A/B}

_{B/A}

The two overall HSF measures can also be expressed as haplotype and allele frequencies (line 1 below), or as a deviation from the single-locus homozygosity (second line below) using individual LD (*D _{ij}*) values and allele frequencies.. Similarly, . It follows that

*F*≥

_{A/B}*F*with equality only when all

_{A}*D*= 0 (a “Wahlund” effect).

_{ij}#### Alternate expressions for *F*_{A/B} and *F*_{B/A} for biallelic data

_{A/B}

_{B/A}

If both loci are biallelic:Similarly,

#### Alternate expressions for *W*_{A/B}^{2} and *W*_{B/A}^{2} for multiallelic data

_{A/B}

_{B/A}

*W _{A/B}*

^{2}and

*W*

_{B/A}^{2}(Table 1) can also be expressed using haplotype and allele frequencies or using individual LD (

*D*) values and allele frequencies:

_{ij}#### Alternate expressions for *W*_{A/B}^{2} and *W*_{B/A}^{2} for biallelic data

_{A/B}

^{2}

_{B/A}

^{2}

If both loci are biallelic:Similarly, .

## Footnotes

Supporting information is available online at http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.114.165266/-/DC1.

*Communicating editor: J. Wall*

- Received April 15, 2014.
- Accepted July 8, 2014.

- Copyright © 2014 by the Genetics Society of America