Genetics, Vol. 161, 473-475, May 2002, Copyright © 2002


Letter to the Editor

"Optimal" Randomization Strategies When Testing the Existence of a Phylogeographic Structure: A Reply to Petit and Grivet

Alan R. Templetona
a Department of Biology, Washington University, St. Louis, Missouri 63130-4899

Corresponding author: Alan R. Templeton

A phylogeographic nested clade analysis (NCA) tests the null hypothesis that haplotypes or clades of haplotypes are randomly distributed geographically relative to those haplotypes or clades with which they are nested together in a higher-order clade called the nesting clade (TEMPLETON et al. 1995 Down). NCA is not a single test, but is rather a series of nested, hierarchical tests. The biological strength of this null hypothesis is that it assumes no a priori phylogeographic signal within the nesting clades. The major purpose of NCA is to identify the roles of both recurrent gene flow and historical events in influencing phylogeographic structure. Therefore, NCA also makes no a priori assumptions as to the biological causes of any detected associations between clades and geography. This feature makes it essential that the statistics be sensitive to genetic variation both within and between the sampled populations because many types of plausible biological causation require some assessment of within to between variation (TEMPLETON 2002 Down). For example, almost all indicators of gene flow in population genetics depend upon statistics that are sensitive to both forms of variation, such as FST.

When the null hypothesis of no association between clades and geography within a nesting clade is rejected, the next step of NCA is biological inference through the use of an inference key. PETIT and GRIVET 2002 Down incorrectly state that "the inference keys used by Templeton imply some knowledge of the (relative) spatial distribution of haplotypes." The inference key is not based upon any such knowledge. Rather, there are a priori expectations of patterns under the assumption of a given biological causation. The inference key is designed to guide the user to see if the observed patterns emerging from many individual nested tests correspond to causal-specific a priori expectations. Hence, Petit and Grivet confuse a priori expectations of a model with a priori knowledge of the populations being studied. The NCA analysis and its inference key make no assumptions about a priori knowledge of the populations being studied.

In contrast to TEMPLETON et al. 1995 Down, PETIT and GRIVET 2002 Down argue that the randomization procedure should permute the entire population sampled at a site over space rather than clades over space. This randomization procedure makes the null hypothesis insensitive to within-population variation. This alternative randomization procedure "was motivated by surprising results" of an analysis of a single data set. However, their motivating results were not surprising at all but are as expected in light of the published warnings about the limits and fallibility of NCA (TEMPLETON 1998 Down). TEMPLETON 1998 Down explored the strengths and limitations of NCA by examining many cases with known a priori expectations. The results revealed that NCA was generally reliable in identifying biological factors known to have been important and was not prone to false positives. But as stated on page 393 of TEMPLETON 1998 Down, "the inference key is not infallible." The one "failure" of NCA concerned a case in which there was long-distance colonization followed by a lack of genetic variation within the colonized region. The single example provided by Petit and Grivet involved "long-distance seed dispersal events" that led to colonies with little or no internal genetic variation. The inference key is already known to be fallible for such a case, so their single example adds no new insights into possible limitations of NCA.

Petit and Grivet advocate a general need for reappraisal of NCA on the basis of their single example. They fail to note that the multiple examples given in TEMPLETON 1998 Down have already demonstrated that their single example has no general validity for NCA. Generalities are best inferred from multiple examples rather than a single example. That is the essential difference between TEMPLETON 1998 Down and PETIT and GRIVET 2002 Down in exploring the limitations of NCA.

NCA has other limitations; for example, the original inference key did not explicitly deal with secondary contact after fragmentation. TEMPLETON 2001 Down therefore designed additional statistics that allow inferences about secondary contact. These new statistics are explicitly given as an addendum to the original NCA and are useful only when there is prior evidence for fragmentation, such as from the original NCA analysis. These new statistics are based upon a sample site approach rather than a haplotype approach. Consequently, sample site-based statistics have already been published and implemented in NCA [these new statistics are still not yet in the public version of the program GEODIS (POSADA et al. 2000 Down) but will be in future versions]. The original NCA when coupled with the new addendum (TEMPLETON 2001 Down) makes it clear that haplotype/clade-based statistics and sample site-based statistics are both important in making phylogeographic inference. Thus, there is no single "optimal" randomization procedure; the different randomization procedures are statistically noncomparable and are directed at different but complementary types of biological inference. Therefore, my disagreement with Petit and Grivet is not over randomization procedures per se, but in the details of how the tests are implemented.

Petit and Grivet suggest that population subdivision first be tested through FST instead of the original NCA. However, TEMPLETON 1998 Down has already shown that NCA is more powerful and yields greater biological discrimination than FST in testing the null hypothesis of no association between genetic variation and geography. Moreover, FST does not discriminate between recurrent forces (e.g., restricted gene flow) vs. historical events (e.g., fragmentation) whereas NCA does. Finally, FST does not incorporate evolutionary history, so it is an inappropriate substitute for NCA for testing phylogeographic associations. Alternatives to FST that do incorporate aspects of evolutionary history, such as AMOVA, require an a priori defined hierarchy among the sampled populations (TURNER et al. 2000 Down), thereby negating one of the principal motivating factors for using NCA in the first place. Petit and Grivet give no rationale for why FST is better than the original NCA for testing the initial null hypothesis involving genetic variation both within and between sample sites, nor do they address the published rationales (TEMPLETON 1998 Down) for why FST is much less appropriate than NCA for this purpose.

After testing for subdivision with FST, Petit and Grivet propose testing the sample site-based null hypothesis using the statistics of the original NCA. In contrast, TEMPLETON 2001 Down proposed new statistics for site-based inference rather than simply using the original NCA statistics. The reason is that the original statistics were specifically designed to identify the roles of recurrent gene flow vs. historical events in influencing phylogeographic associations. It was therefore critical to design statistics that were sensitive to both within- and among-sample site variation. Therefore, NCA requires multiple individuals per site (at least at most sites) as well as multiple sample sites. Otherwise the needed information on within- vs. between-population variation is absent, precluding many types of biological inference (TEMPLETON 2002 Down). Moreover, TEMPLETON 1998 Down has already demonstrated that FST cannot discriminate between restricted gene flow and historical fragmentation, whereas NCA can. Hence, neither of the two stages of testing advocated by Petit and Grivet can make the full panoply of biological discriminations currently possible through NCA. This was not a problem for their single example, because their example was chosen to have little or no within-population variation. However, for any data set containing within-population variation, the procedures recommended by Petit and Grivet are seriously flawed.

Moreover, Petit and Grivet misrepresent the inference structure of the original NCA. All the results they present are based upon just one of the many statistics used in NCA, namely the clade distance statistic Dc(X). They incorrectly define clade distance as measuring "the geographic spread of the individuals that bear haplotype X." The clade distance tests the geographic spread of the individuals that bear a clade (which may or may not be a haplotype) relative to the other clades within the same nesting clade. The italicized portion of the definition is critical because it emphasizes the role of nesting in NCA. Moreover, biological inference in NCA is not based upon any one statistic; rather, it is based upon a pattern of several statistics. The inference key makes it explicit that the same biological inference can be achieved in many different ways. In contrast, Petit and Grivet present only results on the clade distance Dc in a nonnested, nonhierarchical fashion. No biological inference of any sort is possible just from Dc alone in the NCA inference key. This observation leads to another serious difficulty with the recommendation of Petit and Grivet. The NCA statistics and inference key were designed in the context of a nested clade analysis. I have no idea how to interpret the nested results of NCA in terms of their permutation procedure, which uses populations as units rather than clades. The original inference key is inapplicable to their new use of the original NCA statistics. Yet, they do not present any interpretative framework for their new null hypothesis. Until Petit and Grivet address the issue of biological interpretation, they have not presented an alternative to NCA.

The recommendation of Petit and Grivet of first performing an FST analysis followed by an NCA using populations as permutational units rather than clades is flawed. It makes impossible many of the types of biological inference that motivated NCA in the first place. There is no obvious way of interpreting statistics designed to measure spatial attributes of clades by randomly permuting populations across space; new statistics are required for this purpose and have already been developed (TEMPLETON 2001 Down). The original NCA remains appropriate for most data sets (those containing within-population variation), but when fragmented populations are inferred, the NCA should be supplemented with the sample site-based statistics recommended by TEMPLETON 2001 Down. The randomization procedure given in Petit and Grivet is appropriate for the supplemental NCA statistics in TEMPLETON 2001 Down, but it is inappropriate for the original NCA statistics in TEMPLETON et al. 1995 Down.

LITERATURE CITED

POSADA, D., K. A. CRANDALL, and A. R. TEMPLETON, 2000  A program for the cladistic nested analysis of the geographical distribution of genetic haplotypes. Mol. Ecol. 9:487[Medline].

PETIT, R. J. and D. GRIVET, 2002  Optimal randomization strategies when testing the existence of a phylogeographic structure. Genetics 161:469-471[Full Text].

TEMPLETON, A. R., 1998  Nested clade analysis of phylogeographical data: testing hypotheses about gene flow and population history. Mol. Ecol. 7:381-397[Medline].

TEMPLETON, A. R., 2001  Using phylogeographic analyses of gene trees to test species status and processes. Mol. Ecol. 10:779-791[Medline].

TEMPLETON, A. R., 2002  Out of Africa again and again. Nature 416:45-51[Medline].

TEMPLETON, A. R., E. ROUTMAN, and C. A. PHILLIPS, 1995  Separating population structure from population history: a cladistic analysis of the geographical distribution of mitochondrial DNA haplotypes in the tiger salamander, Ambystoma tigrinum.. Genetics 149:767-782.

TURNER, T. F., J. C. TREXLER, J. L. HARRIS, and J. L. HAYNES, 2000  Nested cladistic analysis indicates population fragmentation shapes genetic diversity in a freshwater mussel. Genetics 154:777-785[Abstract/Full Text].




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
S. S. Jakob, E. Martinez-Meyer, and F. R. Blattner
Phylogeographic Analyses and Paleodistribution Modeling Indicate Pleistocene In Situ Survival of Hordeum Species (Poaceae) in Southern Patagonia without Genetic or Spatial Restriction
Mol. Biol. Evol., April 1, 2009; 26(4): 907 - 923.
[Abstract] [Full Text] [PDF]