- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.044974v1
172/1/639 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Luo, Z. W.
- Articles by Finkeldey, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Luo, Z. W.
- Articles by Finkeldey, R.
Originally published as Genetics Published Articles Ahead of Print on September 19, 2005.
Genetics, Vol. 172, 639-646, January 2006, Copyright © 2006
doi:10.1534/genetics.105.044974
Modeling Population Genetic Data in Autotetraploid Species
Z. W. Luo*,
,1,
Ze Zhang*,
R. M. Zhang
,
Madhav Pandey
,
Oliver Gailing
,
Hans H. Hattemer
and
Reiner Finkeldey
* School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom,
Laboratory of Population and Quantitative Genetics, State Key Laboratory of Genetic Engineering, Fudan University, Shanghai 200433, China and
Institute of Forest Genetics and Forest Tree Breeding, Faculty of Forest Science and Forest Ecology, University of Göttingen, Büsgenweg 2, D-37077 Göttingen, Germany
1 Corresponding author: School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom.
E-mail: z.w.luo{at}bham.ac.uk
Allozyme and PCR-based molecular markers have been widely used to investigate genetic diversity and population genetic structure in autotetraploid species. However, an empirical but inaccurate approach was often used to infer marker genotype from the pattern and intensity of gel bands. Obviously, this introduces serious errors in prediction of the marker genotypes and severely biases the data analysis. This article developed a theoretical model to characterize genetic segregation of alleles at genetic marker loci in autotetraploid populations and a novel likelihood-based method to estimate the model parameters. The model properly accounts for segregation complexities due to multiple alleles and double reduction at autotetrasomic loci in natural populations, and the method takes appropriate account of incomplete marker phenotype information with respect to genotype due to multiple-dosage allele segregation at marker loci in tetraploids. The theoretical analyses were validated by making use of a computer simulation study and their utility is demonstrated by analyzing microsatellite marker data collected from two populations of sycamore maple (Acer pseudoplatanus L.), an economically important autotetraploid tree species. Numerical analyses based on simulation data indicate that the model parameters can be adequately estimated and double reduction is detected with good power using reasonable sample size.
POLYPLOIDY has played an important role in the evolutionary diversification of up to 80% of angiosperm species (GRANT 1971; LEWIS 1980; OTTO and WHITTON 2000; SOLTIS and SOLTIS 2000). Two types of polyploids can be distinguished according to their genome origin. Allopolyploids are the product of an interspecific hybridization event and subsequent chromosome doubling, while autopolyploids originate from the whole-genome doubling, likely by fusion of unreduced conspecific gametes. Because bivalents are always formed between pairs of chromosomes with the same origin at meiosis, allopolyploids display disomic inheritance. In contrast, autopolyploids have more than two sets of homologous chromosomes, show multivalent chromosome pairing at meiosis, and display polysomic inheritance. The complexities in modeling polysomic inheritance lie in two major aspects: segregation of multiple dosage alleles at individual loci and the occurrence of double reduction, the phenomenon by which sister chromatids enter into the same gamete (MATHER 1936). Another distinct feature in the population genetics of polyploids is the formation of partial heterozygotes. For example, when two alleles (A1 and A2) segregate at a locus in an autotetraploid population, there are three types of partial heterozygotes: A1A1A1A2, A1A1A2A2, and A1A2A2A2.
For their significance in evolutionary biology and agriculture, autopolyploids have attracted increasing research efforts at both theoretical and experimental scales (RONFORT et al. 1998; LUO et al. 2000, 2001, 2004; MAHY et al. 2000; LOPEZ-PUJOL et al. 2004). Furthermore, rapid advances in the techniques of molecular biology and computer technology have made the genetical analysis of autopolyploids more tractable than ever before (DE WINTON and HALDANE 1931; FISHER 1947). Allozyme and DNA-based microsatellite markers have been used to investigate the divergent heterozygosity between autotetraploids and their parental diploids (MAHY et al. 2000; HARDY and VEKEMANS 2001), to infer the genetic mode of autopolyploidy (LOPEZ-PUJOL et al. 2004), and to assess population structure and gene flow in autotetraploid species (RONFORT et al. 1998). THRALL and YOUNG (2000) developed a computer program, AUTOTET, for calculating allele frequencies of genetic markers in autotetraploid populations. However, it must be pointed out that to use the program, one has to empirically infer genotypes of the markers from inspecting the pattern and intensity of electrophoretic gel bands (for example, LOPEZ-PUJOL et al. 2004). Obviously, diagnosing the intensity of gel bands could be unreliable or impossible for the marker data generated, for example, from DNA sequencers.
In this article, we develop a likelihood-based method for calculating allele frequencies of genetic markers in a random-mating autotetraploid population. The method accounts properly for the problem of missing information of marker phenotype in regard to the corresponding genotype and for the presence of double reduction. It can be used to analyze the genetic structure of autotetraploid populations by making use of allozyme and PCR-based molecular markers. The method is demonstrated by analyzing a data set consisting of five microsatellite markers scored on two populations of Acer pseudoplatanus L., an autotetraploid tree species.
Model and notation:
We consider segregation of alleles at a locus in a random-mating autotetraploid population. Let L be the number of alleles, A1, A2, ... , AL, segregating in the population and the frequencies of the alleles be
, respectively. The coefficient of double reduction, which is the probability of sister chromatids ending up in the same gamete in meiosis (MATHER 1936), at the marker locus is denoted by
. GEIRINGER (1949) found that the equilibrium distribution of zygotic genotypes is given by the expansion of
![]() | (1) |
= 0, this is
![]() | (2) |
![]() | (3) |
It can be seen that there are as many as
distinct genotypes at the marker locus (LUO and MA 2004). To ease the following analysis, we classed the individual genotypes into three groups according to the number of double-reduction gametes they carried and illustrated the equilibrium genotypic distribution in Table 1.
|
Analysis and statistical inference:
The genotypic distribution involves L independent unknown parameters:
since
. When the individual genotypes are distinguishable directly from the phenotype data, estimation of the double-reduction and allelic frequency parameters becomes trivial. However, there is no one-to-one relationship between genotype and phenotype at any genetic markers in the autotetraploids (LUO et al. 2000). To estimate the model parameters, one has to work with phenotype data. For simplicity but without loss of generality, we denote the phenotype of an autotetraploid individual at a marker locus by the number of distinct gel bands scored at the marker locus. For a given dominance mode of the markers, it is easy to convert the marker genotype distribution into the corresponding phenotype distribution. For example, the genotype distribution tabulated in Table 1 can be transformed into the phenotype distribution that is summarized in Table 2 when the marker alleles are assumed to display codominant inheritance. It can be seen from Table 2 that a phenotype provides full information of the corresponding genotype only if the phenotype is represented as four distinct bands; i.e., the presence of four different alleles and, in most cases, an individual genotype cannot be inferred directly from the corresponding phenotype. A general form for converting the genotypic distribution (Table 1) into the phenotypic distribution (Table 2) is
, where the summation is over all genotypes that correspond to the same phenotype, x = i (one band), ij (two bands), ijk (three bands), or ijkl (four bands). gh is the probability of genotype h that is compatible with phenotype x. For example, there are three genotypes (AiAiAiAj, AiAiAjAj, and AiAjAjAj) that are compatible with phenotype ij; thus fij, the probability of this phenotype, equals
as given in Table 2. Let n be the number of individuals randomly sampled from the population under question and m be the number of different phenotypes observed in the sample. The sample can be divided into phenotype groups for which there are ni, nij,
, and
of individuals with one, two, three, and four bands, accordingly. The logarithm of the likelihood function of the observed phenotype data given the model parameters is given by
![]() | (4) |
gives
![]() | (5) |
![]() | (6) |
, can be solved from setting the derivative equation to zero as
![]() | (7) |
. Differentiating Equation 4 with respect to
gives
![]() | (8) |
![]() | (9) |
![]() | (10) |
![]() | (11) |
![]() | (12) |
is a square matrix with
and
,
, and
. Solving Equation 12 yields the MLEs of allelic frequencies
. In fact, Equations 6 and 7 and Equations 11 and 12 represent an EM algorithm for calculating the maximum-likelihood estimates of the model parameters. The algorithm involves iterating the two steps: the expectation step that calculates the posterior probabilities according to Equations 6 and 11 and then the maximization step that calculates the maximum-likelihood estimates of the model parameters from Equations 7 and 12 with
's and
's being calculated from the previous expectation step. These two steps are iterated until the sequence of the likelihood function (4) converges. In contrast, the MLEs of marker allele frequencies can be simplified as
in diploid populations.
|
It is feasible to create a likelihood-based statistical test for the presence of double reduction with the MLEs,
. In fact, the test statistic given by
![]() | (13) |
Test for HardyWeinberg hypothesis:
The above analysis is based on the assumption that segregation of the marker alleles follows the HardyWeinberg equilibrium; i.e., frequencies of marker genotypes are determined by the coefficient of double reduction and the relevant allele frequencies. With the MLEs of the model parameters, the expected genotype (or phenotype) frequencies can be calculated and thus a chi-square statistic with a form of
![]() | (14) |
Estimation of genetic heterozygosity:
With the MLEs of the allelic frequencies, one can estimate genetic heterozygosity at marker loci. The genetic heterozygosity, which accounts for the distinct allelic constitutions of tetraploid genotypes, was defined as the probability of any two alleles being not identical in a given genotype (BEVER and FELBER 1992; THRALL and YOUNG 2000). In the present context and notations, the observed genetic heterozygosity can be calculated from
![]() | (15) |
![]() | (16) |
When marker data are collected from k subpopulations of a tetraploid species, one can calculate marker heterozygosity for each of these populations according to the above analysis. Let
be the heterozygosity in the subpopulation I, and
. We define forms of Wright's F-statistics for autotetraploids by assimilating that in diploids (HARTL and CLARK 1997) as
![]() | (17) |
![]() | (18) |
and
is the expected heterozygosity of an equivalent total population of all the subpopulations under the HardyWeinberg equilibrium. Simulation study:
To validate the method presented above and to investigate its properties, we set up a simulation study to mimic segregation of alleles at a single marker locus in autotetraploid populations at equilibrium. The simulation model allows us to vary the number and frequencies of marker alleles and to simulate different values of the coefficient of double reduction. For any given simulated parameters, individual genotypes were randomly sampled from the simulated population for which the genotypic distribution is defined in Table 1. The sampled genotypes were converted into phenotypes according to LUO et al (2000). The simulation study considered the segregation of five marker alleles with either an equal frequency or different frequencies in populations of different sample sizes and two values of the coefficient of double reduction,
= 0.05 and 0.1, at the marker locus.
Tabulated in Table 3 are the means and standard deviations of the MLEs of the coefficient of double reduction and allelic frequencies over 100 repeated simulations, together with the simulation parameters. Also, empirical power,
, is listed for detecting significance of double reduction, which was calculated as the proportion of significant tests of double reduction in the repeated simulations. It can be seen from Table 3 that the parameters are estimated adequately in all cases even with a sample size as small as n = 50. Double reduction can be detected with fairly good power when the sample size is not <100 and when
= 0.1. For a given sample size, double reduction is tested with larger power when marker alleles are evenly distributed as compared to noneven distribution of marker alleles. As expected, the standard deviation of the estimates decreases and the power of the double reduction test increases with increasing sample size.
|
Analysis of microsatellite data from two sycamore maple populations:
Sycamore maple (A. pseudoplatanus L.) is a forest tree species native to central Europe and western Asia, growing especially in mountainous regions. The species is an autotetraploid with a chromosome number of 2n = 4x = 52 (DARLINGTON and WYLIE 1955). Because of its economic and ecological importance, numerous plantations were established throughout Europe during the last century. However, the genetic structure and its mating system in natural populations or in plantations remain unclear.To explore the genetic variation pattern, gene flow, and mating system of the species, PANDEY et al. (2004) have started developing microsatellite markers in this species. For a proof-of-principle demonstration purpose, we detailed here analysis of data of a marker, MAP-9, which was scored for 133 and 80 individuals, respectively, from two populations, Södderich and Weisswassertal, near Göttingen in Germany. The marker data, which were collected as peak value reads from the ABI PRISM 3100 genetic analyzer (Applied Biosystems, Foster City, CA/Hitachi, San Jose, CA), were converted into gel-band-like phenotypes.
Table 4 summarizes the phenotypic distribution and analysis of the marker data in the two sycamore maple populations. It shows that the 133 and 80 individuals sampled from the two populations can be grouped into 12 and 6 different phenotypic groups, respectively. From the phenotype, seven distinct marker alleles are observed in the Södderich population sample but only five of these are present in the Weisswassertal population sample. From the data sets, the maximum-likelihood estimates are calculated for frequencies of the marker alleles and the coefficient of double reduction by making use of the method developed in this study. It shows that marker alleles 1 and 3 are detected in the two populations as the most frequently occurring alleles. There is no evidence of double reduction at this marker locus. The goodness-of-fit test demonstrates that segregation of the marker alleles in both the populations agrees well with the HardyWeinberg equilibrium (
). To make the chi-square-based fitness tests more stable (KENDALL and STUART 1961, p. 440), the analysis has excluded those phenotype classes with expected sample observations <1.0. In addition, we calculated genetic heterozygosity (HO) at the marker locus using Equation 15 and observed that HO = 0.6901 and 0.6833 for the Södderich and the Weisswassertal populations, respectively. On the basis of the observed genetic heterozygosity estimates and their corresponding expected values that can be calculated from Equation 16, the FST is estimated to be 0.001, suggesting trivial genetic differentiation between the two populations.
|
This article developed a theoretical model for characterizing genetic segregation of alleles at genetic marker loci, an autotetraploid population locus and a novel maximum-likelihood-based method to estimate the parameters defining the population genetic model. The model properly accounts for segregation complexities due to multiple alleles and double reduction at tetrasomic loci in natural populations, and the statistical method takes appropriate account of incomplete marker phenotype information with respect to the genotype due to multiple dosage allele segregation at marker loci in tetraploids. The theoretical analysis and methods developed in this study are validated by making use of a computer simulation study and their utility is demonstrated by analyzing microsatellite marker data collected from two populations of sycamore maple (A. pseudoplatanus L.), an economically important autotetraploid tree species. Numerical analyses based on simulation data indicate that the model parameters can be adequately estimated and double reduction is detected with good power with a sample size of
100. Analysis of the sycamore maple data illustrates the data format required by the statistical method and demonstrates how the method can be used to predict various population genetic parameters by using DNA molecular data in autotetraploid species. A practical problem in real data analysis might be that the number of segregating alleles is too large in relation to reasonable sample sizes. Thus, the sample size required for accurate parameter estimation and efficient testing of double reduction may depend largely on the number of segregating alleles; the sample size considered here may be suitable only when there are several marker alleles segregating in the population.
Although the theory and methods developed in this study were demonstrated for codominant markers, it is feasible for their extension to dominant markers such as RAPDs, AFLPs, etc. In fact, there are usually two segregating alleles at these marker loci, i.e., a dominant and a recessive allele; this greatly reduces the number of possible genotypes and the number of possible phenotypes at the locus and thus simplifies the model. In addition, the recessive allele will be phenotypically present only when it is present in four copies.
Double reduction is one of the most distinguished features of tetrasomic inheritance and also one of the major difficulties in modeling tetrasomic inheritance when compared to disomic inheritance. BUTRUILLE and BOITEUX (2000) exploited the impact of double reduction on the evolution of deleterious mutants in autotetraploid genomes and found that low frequencies of double reduction are enough to reduce equilibrium frequencies, which are maintained at a selection and mutation balance, by severalfold. This suggests the important effect of double reduction on the evolution of the species. However, there are no proposals, either in their study or in the literature, to address how to test for the significance of double reduction in natural populations of the species. This study provides such a test and thus fills a gap in the evolutionary study of autotetraploid species.
The algorithm developed in this study was programmed in Fortran-90 computer language. The program is available upon request from the corresponding author.
BEVER, J. D., and F. FELBER, 1992 The theoretical population genetics of autopolyploidy, pp. 185217 in Oxford Surveys in Evolutionary Biology, edited by J. ANTONOVICS and D. FUTUYMA. Oxford University Press, Oxford.
BROWN, A. H. D., and A. G. YOUNG, 2000 Genetic diversity in tetraploid populations of the endangered daisy Rutidosis leptorrhynchoides and implications for its conservation. Heredity 85: 122129.
BUTRUILLE, D. V., and L. S. BOITEUX, 2000 Selection-mutation balance in polysomic tetraploids: impact of double reduction and gametophytic selection on the frequency and subchromosomal localization of deleterious mutations. Proc. Natl. Acad. Sci. USA 97: 66086613.
DARLINGTON, C. D., and A. P. WYLIE, 1955 Chromosome Atlas of Flowering Plants. George Allen & Unwin, London.
DE WINTON, D., and J. B. S. HALDANE, 1931 Linkage in the tetraploid Primula sinensis. J. Genet. 24: 121144.
FISHER, R. A., 1947 The theory of linkage in polysomic inheritance. Philos. Trans. R. Soc. Lond. B 233: 5587.
GEIRINGER, H., 1949 Chromatid segregation in tetraploids and hexaploids. Genetics 34: 665684.
GRANT, V., 1971 Plant Speciation. Columbia University Press, New York/London.
HARDY, O. J., and X. VEKEMANS, 2001 Patterns of allozyme variation in diploid and tetraploid Centaurea jacea at different spatial scales. Evolution 55: 943954.[Medline]
HARTL, D. H., and A. G. CLARK, 1997 Principles of Population Genetics, Ed. 3. Sinauer Associates, Sunderland, MA.
KENDALL, M. G., and A. STUART, 1961 The Advanced Theory of Statistics: Inference and Relationship, Vol. 2. Charles Griffin, London.
LEWIS, W. H., 1980 Polyploidy: Biological Relevance. Plenum Press, New York.
LOPEZ-PUJOL, J., M. BOSCH, J. SIMON and C. BLANCHE, 2004 Allozyme diversity in the tetraploid endemic Thymus loscossi (Lamiaceae). Ann. Bot. 93: 323332.
LUO, Z. W., and L. MA, 2004 An improved formulation of marker heterozygosity in recurrent selection and backcross schemes. Genet. Res. 83: 4953.[CrossRef][Medline]
LUO, Z. W., C. A. HACKETT, J. E. BRADSHAW, J. W. MCNICOL and D. MILBOURNE, 2000 Predicting parental genotypes and gene segregation for tetrasomic inheritance. Theor. Appl. Genet. 100: 10671073.[CrossRef]
LUO, Z. W., C. A. HACKETT, J. E. BRADSHAW, J. W. MCNICOL and D. MILBOURNE, 2001 Construction of a genetic linkage map in tetraploid species using molecular markers. Genetics 157: 13691385.
LUO, Z. W., R. M. ZHANG and M. J. KEARSEY, 2004 Theoretical basis for genetic linkage analysis in autotetraploid species. Proc. Natl. Acad. Sci. USA 101: 70407045.
MAHY, G., L. P. BRUEDERLE, B. CONNORS, M. VAN HOFWEGEN and N. VORSA, 2000 Allozyme evidence for genetic autopolyploidy and high genetic diversity in tetraploid cranberry, Vaccinium oxycoccos (Ericaceae). Am. J. Bot. 87: 18821889.
MATHER, K., 1936 Segregation and linkage analysis in autotetraploids. J. Genet. 32: 287314.
OTTO, S. P., and J. WHITTON, 2000 Polyploid incidence and evolution. Annu. Rev. Genet. 34: 401437.[CrossRef][Medline]
PANDEY, M., O. GAILING, D. FISCHER, H. H. HATTEMER and R. FINKELDEY, 2004 Characterization of microsatellite markers in sycamore (Acer pseudoplatanus L.). Mol. Ecol. Notes 4: 253255.[CrossRef]
RONFORT, J., E. JENCZEWSKI, T. BATAILLON and F. ROUSSET, 1998 Analysis of population structure in autotetraploid species. Genetics 150: 921930.
SOLTIS, D. E., and P. S. SOLTIS, 1989 Genetic consequences of autopolyploidy in Tolmiea (Saxifragaceae). Evolution 43: 586594.[CrossRef]
SOLTIS, P. S., and D. E. SOLTIS, 2000 The role of genetic and genomic attributes in the success of polyploids. Proc. Natl. Acad. Sci. USA 97: 70517057.
THRALL, P. H., and A. YOUNG, 2000 AUTOTET: a program for the analysis of autotetraploid genotypic data. J. Hered. 91: 348349.
WENDEL, F., and N. F. WEEDEN, 1989 Visualization and interpretation of plant isozymes, pp. 545 in Isozymes in Plant Biology, edited by D. E. SOLTIS and P. S. SOLTIS. Dioscorides Press, Portland, OR.
Communicating editor: Y.-X. FU
This article has been cited by other articles:
![]() |
M. Stift, C. Berenos, P. Kuperus, and P. H. van Tienderen Segregation Models for Disomic, Tetrasomic and Intermediate Inheritance in Tetraploids: A General Procedure Applied to Rorippa (Yellow Cress) Microsatellite Data Genetics, August 1, 2008; 179(4): 2113 - 2123. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. W. Luo, Z. Zhang, L. Leach, R. M. Zhang, J. E. Bradshaw, and M. J. Kearsey Constructing Genetic Linkage Maps Under a Tetrasomic Model Genetics, April 1, 2006; 172(4): 2635 - 2645. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.044974v1
172/1/639 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Luo, Z. W.
- Articles by Finkeldey, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Luo, Z. W.
- Articles by Finkeldey, R.


















(i = 1, 2, ... , 5) over 100 repeated simulations of varying sample sizes, n