help button home button Genetics PLANT CELL
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Vogl, C.
Right arrow Articles by Xu, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vogl, C.
Right arrow Articles by Xu, S.
Genetics, Vol. 155, 1439-1447, July 2000, Copyright © 2000

Multipoint Mapping of Viability and Segregation Distorting Loci Using Molecular Markers

Claus Vogla,b and Shizhong Xub
a Department of Biology, University of Oulu, FIN-90401 Oulu, Finland
b Department of Botany and Plant Sciences, University of California, Riverside, California 92521

Corresponding author: Claus Vogl, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521., claus{at}genetics.ucr.edu (E-mail)

Communicating editor: Z-B. ZENG


*  ABSTRACT
*TOP
*ABSTRACT
*THEORY
*APPLICATIONS
*DISCUSSION
*LITERATURE CITED

In line-crossing experiments, deviations from Mendelian segregation ratios are usually observed for some markers. We hypothesize that these deviations are caused by one or more segregation-distorting loci (SDL) linked to the markers. We develop both a maximum-likelihood (ML) method and a Bayesian method to map SDL using molecular markers. The ML mapping is implemented via an EM algorithm and the Bayesian method is performed via the Markov chain Monte Carlo (MCMC). The Bayesian mapping is computationally more intensive than the ML mapping but can handle more complicated models such as multiple SDL and variable number of SDL. Both methods are applied to a set of simulated data and real data from a cross of two Scots pine trees.


CHROMOSOMAL regions that cause distorted segregation ratios in early life stages may be referred to as segregation-distorting loci (SDL). These distortions are caused either by differential representation of SDL genotypes in gametes before fertilization or by viability differences of SDL genotypes after fertilization but before genotype scoring. In both cases, the observable phenotype is a distortion of marker locus genotypes in chromosomal regions close to the SDL. Hence, regardless of the timing of action of the SDL, mapping of locations and estimation of effects of SDL follow the same statistical treatment.

Let us first discuss mechanisms that cause deviated segregation ratios by altering the gametic proportions. With meiotic drive, gametic proportions become distorted during meiosis because one chromosome type may preferentially end up in the egg nucleus (meiotic drive). Meiotic drive is known, e.g., for the maize chromosome 10 where a variant carrying a heterochromatic knob is preferentially transmitted (reviewed in GRANT 1975 Down). Gametes carrying a certain allele act to render gametes carrying the homologous chromosome, e.g., the segregation distorter (SD) and sex ratio (SR) loci of Drosophila and the t-alleles of mice (e.g., HARTL and CLARK 1997 Down, p. 244ff). Meiotic drive can be a powerful selective force. The t-alleles are maintained in the population, even though they are homozygous lethals, due to their 0.95 probability of being passed to the next generation in heterozygotes. In many species hybridizations, outbreeding depression and segregation distortion have been observed in the F2 generation. These are often caused by structural differences between chromosomes (WHITKUS 1998 Down), i.e., by events before fertilization.

Haploid life stages can be exposed to selection, especially in plants. In the life cycle of mosses, the haploid life stage (the gametophyte) is dominant over the diploid life stage (the sporophyte). In vascular plants, maize gametophytic mutations indicate that pollen tube growth rates are determined in part by the genotypes of the microgametophytes (reviewed in GRANT 1975 Down).

Viability selection after fertilization may be more important than gametic selection. Viability selection is common in consanguinous matings where inbreeding depression reduces the survival of homozygotes compared to heterozygotes (CHARLESWORTH and CHARLESWORTH 1987 Down). Viability selection gives rise to segregation ratios distorted from 1:2:1 at linked loci. Inbreeding depression is often expressed in very early life stages (HUSBAND and SCHEMSKE 1996 Down). In Scots pine, only ~15% of self-fertilized embryos develop into mature seeds, whereas ~75% do so in wind-pollinated seeds (KARKKAINEN et al. 1996 Down). Some aspects of the genetic basis of inbreeding depression require further investigation, e.g., number and effects of loci and degree of dominance. Yet these factors have major consequences for mating system evolution (CHARLESWORTH and CHARLESWORTH 1998 Down), conservation genetics (HEDRICK 1994 Down), and plant breeding (e.g., WILLIAMS and SAVOLAINEN 1996 Down). A biased segregation ratio due to viability differences of genotypes also occurs in the F2 generation of wide crosses. This is generally thought to be caused by epistatic interactions.

Often events before fertilization cannot be distinguished from events after fertilization. MCCOLDRICK and HEDGECOCK 1997 Down reported that crosses of Crassostrea gigas, the Pacific oyster, produced biased segregation ratios when tested as adults. Later LAUNEY and HEDGECOCK 1999 Down showed that, for many loci, the ratios were Mendelian when 6-hr-old larvae were assessed, but the ratios deviated from the Mendelian ratios when the animals were 2 to 3 mo old in the same crosses. Hence, the differences are due to post-fertilization viability selection.

Quantitative trait loci (QTL) are usually mapped in agronomically important plants and animals. To increase differences of parental types, and thus to increase the power of mapping, crosses are often conducted between inbred lines or between distantly related cultivars or even between species. As discussed above, these conditions promote segregation distortion.

For molecular characterization of the genetic causes of distorted segregation ratios, mapping of the location and effects of SDL would be desirable. As the phenotype in SDL mapping is different from that of QTL mapping (data in SDL mapping usually consist of frequencies of genotypes among survivors), QTL methods cannot be used for SDL mapping. Development of advanced methods for estimation of locations and effects of SDL has been lagging behind that for QTL mapping. In the past, often a single marker was considered at a time, where only the linkage between one fully informative marker and a single SDL was tested (SORENSEN 1967 Down; SERVITOVA and CETL 1984 Down; HEDRICK and MUONA 1990 Down; FU and RITLAND 1994A Down; KARKKAINEN et al. 1999 Down). In a single-marker test, the number of distinguishable genotypic configurations of the marker is at best equal to the number of genotypic configurations of a linked SDL, but the genotypic frequencies of the marker are affected by the recombination fraction in addition to the frequencies of the SDL's genotypic configurations. Hence, for a single-marker test, estimations of the position and effect are confounded.

Errors in marker genotyping may also cause systematic deviations from the expected segregation ratio. Randomly amplified polymorphic DNA (RAPD) markers are often misscored as a faint band and may be interpreted as absent. This may lead to misscoring of only a single marker. In contrast, if segregation distortion is caused by SDL, all markers in the vicinity of the SDL will be affected.

FU and RITLAND 1994B Down, MITCHELL-OLDS 1995 Down, and CHENG et al. 1996 Down have developed maximum-likelihood methods for mapping one SDL using flanking markers, i.e., an interval mapping strategy (LANDER and BOTSTEIN 1989 Down). Given a map of fully informative markers, no missing data, no interference between recombinations, and no more than one SDL per chromosome, this theory can be used to scan the genome for SDL. Under these assumptions, loci outside the interval flanking the SDL contribute no information to the segregation of the SDL. But more than one SDL per chromosome may be present and markers may be only partially informative. Furthermore, due to the effects of SDL, estimation of map distances of markers might become biased (LORIEUX et al. 1995A Down, LORIEUX et al. 1995B Down; LIU 1998 Down). This might cause the interval mapping method to become inefficient and biased.

The SDL analysis is based on binomial (or multinomial) distributions instead of normal distributions, and hence multiple regression is not readily available and cannot be combined with conventional interval mapping as in the composite interval mapping (CIM; ZENG 1994 Down) or the multiple QTL mapping (MQM) scheme (JANSEN and STAM 1994 Down). Therefore, multiple SDL on a single chromosome pose an unsolved theoretical problem. On the other hand, if maps are inferred correctly and if SDL on different chromosomes do not interact epistatically, i.e., SDL effects combine multiplicatively, linkage to an SDL is solely responsible for the phenotype. SDL analysis of one chromosome is therefore usually independent from other chromosomes.

We present a multipoint method for mapping multiple SDL using a backcross design. The multipoint method is developed under both the maximum-likelihood and the Bayesian frameworks.


*  THEORY
*TOP
*ABSTRACT
*THEORY
*APPLICATIONS
*DISCUSSION
*LITERATURE CITED

Model:
We develop and present the model under a backcross design only, although the method can be applied to other controlled mating designs as well. We assume that the parents that initiate the cross are pure inbred lines. The F1 of the cross is backcrossed to one of the parents and a total of N individuals are generated in the backcross (BC) family for mapping. We are interested in mapping loci responsible for segregation distortion using multiple markers that are already mapped on the genome. The data here are the observed marker genotypes (configurations). The parameters, however, are the number of SDL, the locations, and effects of these loci. We assume that all markers are neutral in the sense that their segregations would be Mendelian if there were no linked SDL on the same chromosome. The observed segregation distortions on these neutral markers, however, are caused by one or more SDL near the markers.

Note that the flow of causality is from the SDL to the genotypic configurations of the SDL, then from the genotypic configurations of the SDL to the genotypic configurations of the marker loci, and finally from the genotypic configurations of the marker loci to the observed marker information. We first consider a single SDL. The genotype of the F1 is heterozygous and that of a BC individual (generated from F1 backcrossed to the first inbred parent) is either heterozygous or homozygous for the allele of the first parent with an unequal probability. The degree of asymmetry in the probability is determined by the effect (size) of the SDL. Define

for i = 1, ... , N. This indicator variable, {phi}i, is also called the "inheritance digit" because it indicates which of the two alleles carried by the F1 has been inherited to the ith progeny. Parameters of interest are the effect, denoted by {pi}, and location, denoted by {lambda} of the segregation distorting locus. The distribution of {phi}i is Bernoulli with

(1)

for i = 1, ... , N, with

(2)

Note that in the SDL case the distribution of the inheritance digit of the SDL given {pi} is independent of the location. Another parameter of interest is the location of the SDL on the chromosome, denoted by {lambda}, which will be dealt with later. In the absence of segregation distortion, we have {pi} = . Therefore, the deviation of {pi} from 1/2 is the effect or size of the SDL. If {phi}i were observable, we could directly estimate and test {pi}. The maximum-likelihood estimate would be

(3)

if we could maximize the following log-likelihood function:

(4)

But {phi}i is not observable; only the inheritance digits of marker alleles can be observed. Therefore, an entirely different approach is required to estimate {pi}. Consider M markers with known map positions on the chromosome of interest. Define the inheritance digits of the ith individual at the jth marker locus as

for i = 1, ... , N. Without genotyping errors, there are just three possibilities of marker information Iij of the ith individual at the jth marker locus. The first two cases are mutually exclusive events: either one or the other marker inheritance digit is observed. In the third case of a missing observation, we define the marker information as the union of the former two cases. Thus, Pr(Iij|{phi}ij) = 1 if the marker information is compatible with {phi}ij and Pr(Iij|{phi}ij) = 0 otherwise. In the latter case, Pr(Iij|{phi}ij) = 1 is equal to 1 independent of the inheritance digit. If there are genotyping errors Pr(Iij|{phi}ij) will assume values intermediate between 0 and 1. Note that Pr(Ii1, ... , IiM|{phi}i1, ... , {phi}iM) = {Pi}Mj=1 Pr(Iij|{phi}ij) because conditional on the jth inheritance digit the jth marker information is independent from all other variables.

Given the position ({lambda}) of the SDL on the chromosome, the joint distribution for {phi}i and {phi}i1, ... , {phi}iM is

(5)

where Pr({phi}i1, ... , {phi}iM|{phi}i, {lambda}) can be found using the property of a two-state Markov chain (LANDER and GREEN 1987 Down; JIANG and ZENG 1997 Down). We assume that there is no interference between two consecutive crossovers so that Haldane's mapping function applies. Under this assumption, the sequence

forms a Markov chain with two discrete states, where the markers are ordered according to their positions on the chromosome and the SDL is located between markers k and k + 1. We, thus, have

(6)

where

is the transition probability between two consecutive loci and rj(j + 1) is the recombination fraction between loci j and j + 1. The transition probability between the SDL and the nearby marker k is

where rkl is the recombination fraction between the kth marker and the SDL identified as locus l. The transition probability between the SDL and the (k + 1)th locus is obtained similarly.

Let Ii = [Ii1, ... , IiM]. Combining Equation 6 with the marker information and "summing out" the marker inheritance digits, we get

where we have made use of the independence from other markers of the jth marker information conditional on the jth marker inheritance digit. Combining the previous formula with Equation 5 results in the following equation:

(7)

Maximum likelihood:
Having formulated the probability model, we now introduce a maximum-likelihood method to estimate and test the SDL. There are several ways to find the maximum-likelihood estimate of {pi}; we adopt an expectation maximization (EM) algorithm and treat {phi}i as missing data. We treat {lambda} as a known constant for the moment. Let I = [I1, ... , IN] and {phi} = [{phi}1, ... , {phi}N]. For the EM algorithm we need to determine the logarithm of Pr(I, {phi}|{pi}, {lambda}), i.e.,

(8)

The constant does not depend on the parameter of interest, {pi}.

Conditional on the data, the position, and the initial value of the parameter, {pi}(0), the posterior probabilities of {phi}i = 0 and {phi}i = 1 are, respectively,

(9a)

and

(9b)

Because Pr({phi}i | Ii, {pi}(0), {lambda}) follows a Bernoulli distribution, the probability in (9a) is equivalent to the expectation E [{phi}i|Ii, {pi}(0), {lambda}] = (0)i. Taking the expectation of (8) with respect to {phi} and substituting {phi}i into the resulting formula, we have completed the expectation step in the EM-algorithm. The M-step consists of maximizing the resulting equation to obtain

(10)

Equation 9aEquation 9b and Equation 10 are iterated until convergence.

We can now test the null hypothesis that there is no segregation distortion for the particular location {lambda}. The null hypothesis is formulated as H0: {pi} = 1/2, which can be tested using the likelihood-ratio test statistic {Lambda} = -2(l(1/2, {lambda}) - l(, {lambda})), where l(, {lambda}) is the log likelihood

(11)

evaluated at the maximum-likelihood estimate , and l(1/2, {lambda}) = N log(1/2) is the log-likelihood value under Mendelian segregation. Under the null model, {Lambda} is approximately distributed as a chi-square variable with 1 d.f.

The maximum-likelihood estimate of the position of the SDL, {lambda}, can be obtained by examining the likelihood-ratio profile along the chromosome, as is commonly done in interval mapping of QTL.

Bayesian analysis:
We now introduce the Bayesian analysis of SDL implemented via the Markov chain Monte Carlo (MCMC). We first classify variables into observables and unobservables. The observables are the data, denoted by I. The unobservables include parameters and missing information. The parameters here include {pi} and {lambda}, and the missing information consists of the inheritance digits {phi} and {phi} in the current situation. We always sum over all the missing information, such that inheritance digits will only appear in intermediate steps. The joint posterior distribution of the parameters is

(12)

where Pr({pi}) and Pr({lambda}) are the prior distributions for the parameters of interest; beta with Beta(1, 1) for the former and uniform for the latter. Samples are simulated from the joint posterior distribution via the MCMC. In the MCMC analysis, instead of sampling all the unobservables simultaneously, we sample one unobservable at a time with others taking values simulated in the previous cycle. When all the unobservables are updated, we have completed one cycle of the Markov chain. When the chain reaches a stationary stage, subsequent samples are considered to be drawn from the joint posterior distribution.

Starting with an initial value for each parameter, {{pi}(0), {lambda}(0)}, we sample {pi} using the Metropolis-Hastings algorithm (e.g., GELMAN et al. 1995 Down). A new proposal, {pi}*, is sampled from a beta proposal distribution J({pi}*|{pi}(0)) = Beta({pi}(0)N + 2, (1 - {pi}(0))N + 2). The proposal {pi}* is accepted with probability min{1, a({pi}*, {pi}(0))}, where

(13)

Note that the first term is the ratio of posterior probabilities of the parameters and the second term is the ratio of proposal probabilities. If {pi}* is accepted, we take {pi}(1) = {pi}*; otherwise we do not update the effect of the SDL and simply take {pi}(1) = {pi}(0). The beta proposal distribution assures that 0 <= {pi} <= 1. The simulated value of {pi}, denoted by {pi}(1), is then used to generate {lambda}. We use the Metropolis algorithm (e.g., GELMAN et al. 1995 Down). First, a new value of {lambda} is proposed by a small perturbation from {lambda}(0), i.e.,

where x is a uniform variable sampled from U(0, d) and d is a small positive number, e.g., 0.1 times the length of the linkage group. We accept this proposal with probability min{1, a({lambda}*, {lambda}(0))}, where

(14)

If {lambda}* is accepted, we take {lambda}(1) = {lambda}*; otherwise {lambda}(1) = {lambda}(0).

Multiple-SDL model:
Consider the joint action of L SDL located on the chromosome of interest. Define the locations of these SDL by {lambda} = {{lambda}l} for l = 1, ... , L, in contrast to the single-SDL model where {lambda} is a scalar. Also define the marginal effects of the SDL by {pi} = {{pi}l} for l = 1, ... , L. Assume that these SDL act multiplicatively then the joint effect of all the SDL can be formulated as a product of these marginal effects. Define {phi}i = [{phi}i1, ... , {phi}iL] and {phi}i = [{phi}i1, ... , {phi}iM] as vectors of inheritance digits of all SDL and marker loci, respectively, for the ith individual. Using Bayes' theorem, the joint posterior distribution of {phi}i can be formulated as

(15)

The joint posterior distribution of the parameters is

(16)

where Pr({pi}) = {Pi}Ll=1Pr({pi}l), Pr({lambda}) = {Pi}Ll=1Pr({lambda}l), and

(17)

Under the multiple-SDL model, formulation of an EM algorithm seems impossible. On the other hand, the Bayesian method requires little modification: instead of updating the effect and location of a single locus at a time, {lambda} and {pi} are updated iteratively for all loci.

With the Bayesian approach, the number of SDL (L) can be treated as an unknown variable. This involves a change in the dimension of the model. Reversible jump MCMC (GREEN 1995 Down; SATAGOPAN and YANDELL 1996 Down; HEATH 1997 Down; RICHARDSON and GREEN 1997 Down; SILLANPAA and ARJAS 1998 Down; STEPHENS and FISCH 1998 Down) is an extension to the Metropolis-Hastings sampler, permitting moves to be made between models with different dimensions. The joint posterior distribution of the parameters is

(18)

where Pr(L) is the prior probability of the number of SDL. We chose a Poisson prior (with mean µ = 1) for Pr(L) truncated at Lmax. After each existing SDL has been updated, we propose two types of move to update L, adding a locus if L < Lmax (with probability pa) and deleting a locus if L > 0 (with probability pd).

For adding an SDL, a new location {lambda}L+1 and effect {pi}L+1 are sampled from their uniform priors for the new SDL. The new sets of parameters are {pi}* = ({pi}(0), {pi}L+1) and {lambda}* = ({lambda}(0), {lambda}L+1). We then accept this new SDL with probability min{1, a(L + 1, L)}, where

(19)

If the new SDL is accepted, its location and effect are accepted simultaneously; otherwise, the number of SDL remains the same. In the deleting step, a random SDL is proposed to be deleted. Then the SDL are renumbered such that the candidate SDL is the last SDL, i.e., the Lth SDL. The new parameter sets will be {pi}* = ({pi}1(0), ... , {pi}L-1(0)) and {lambda}* = ({lambda}1(0), ... , {lambda}L-1(0)). The proposal is accepted with probability min{1, a(L - 1, L)}, where

(20)

Note that we handle SDL within the same marker interval in exactly the same way as SDL in different intervals and that (20) is just the inverse of (19). Our interpretation of the terms (L + 1)-1 and L in (19) and (20), respectively, differs from the usual. Usually, these terms are included to account for a perceived imbalance in the number of loci selected for a delete step vs. that selected for an addition step if the order of loci is not fixed. We believe that the balance is one to one in both the addition and deletion steps and no balancing is necessary; we include these terms because of the Poisson prior. The difference to the usual algorithm, however, is just a minor modification of the prior distribution and thus irrelevant in most biological applications.


*  APPLICATIONS
*TOP
*ABSTRACT
*THEORY
*APPLICATIONS
*DISCUSSION
*LITERATURE CITED

To illustrate the method, a simulation study and an analysis of a data set from one cross of two Scots pine (Pinus sylvatica) trees are presented. The simulation study conforms to an inbred line BC situation. In the pine data analysis, we concentrate on the maternal part of the progeny of a single tree, i.e., a pseudobackcross design. In a backcross it is not possible to distinguish between gametic selection and viability selection after fertilization.

Simulations:
In the simulation study, first, a single viability locus that eliminates 50% of the progeny of the heterozygous genotype, i.e., {pi} = 2/3, was placed in the middle of a chromosome of length 1 M; six markers were spaced at regular intervals of 0.2 M along the chromosome; no missing data were considered. In the second simulation, two SDL with the same effects as in the single-SDL situation were placed at locations 0.33 M and 0.67 M, respectively. In both cases, simulations with sample sizes of 500 were repeated five times and results were compared; additionally, simulations with sample sizes of 100 and less were also performed. Compared to empirical reports of distortions of marker loci from Mendelian ratios, the simulated effect is high but not unrealistic. The marker map is rather dense and fully informative.

The outcomes of the analyses of the five simulated data sets were almost identical such that we present only one of them. In the maximum-likelihood (ML) analysis, the number of SDL was fixed to one. The inferred effect, the likelihood-ratio statistic {Lambda}, is reported at each location. We also performed an MCMC analysis of the same data. From Fig 1A, we see that the position and effect of the SDL are estimated quite accurately. For the other four simulations, the inferred positions were also mostly between the two middle markers and the estimated effects were close to the true value. Reducing sample sizes did not appreciably change the estimate of location or effect. The likelihood-ratio statistic, however, dropped considerably (results not shown). We do not present the ML results with two SDL, because the model is not appropriate.



View larger version (21K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Simulated data. (A and B) A simulation with one SDL; (C) a simulation with two SDL. The scale on the x-axis is 1 M, the positions of the markers are indicated with an "x," while the positions of the SDL are indicated with a circle. "Likelihood" refers to the broken line and to twice the log-likelihood ratio; "frequency" to the posterior probability of an SDL in an interval of 0.04 the length of the linkage group; and "effect" to the solid line and to the probability of finding the homozygote genotype in the BC.

With the Bayesian MCMC analysis, the Poisson prior mean was set to µ = 1 and the maximum number of SDL was set to three. The chain length was 105. The chain was thinned by storing only after every 10th cycle. No burn-in period was discarded because the chain reached approximate stationarity very quickly. The posterior probability of the simulated number of SDL (i.e., one or two, respectively) was always between 0.6 and 0.9. In the one-SDL case, frequencies are higher at the center, i.e., close to the simulated position (Fig 1B). Effects are very similar to those estimated with the ML method. In the two-SDL case, posterior distributions of both the locations as well as the effects are about correct (Fig 1C). It can be easily discerned from the posterior distribution of frequencies that there are actually two SDL present. When the number of individuals was reduced, the posterior probability of the different numbers of SDL approached that of the prior distribution rapidly (data not shown). This corresponds to the decrease in the likelihood-ratio statistic with decreasing sample size.

Pine data:
In the second application, data consisted of the megagametophytes of open-pollinated offspring of a single Scots pine P. sylvestris tree, P304 (HURME and SAVOLAINEN 1999 Down). Megagametophytes are haploid tissues consisting of the maternal part of the seedling's genome and can be scored at the seedling stage without damaging the seedling. We treated the progeny of this tree as a pseudobackcross family. Map distances and linkage phases were determined with Mapmaker as described in HURME and SAVOLAINEN 1999 Down. Five RAPD markers from linkage group 2 were used in this family: C02-680, G13-750, K09-750, E09-250, and AC15-270 at positions 0.038 M, 0.115 M, 0.287 M, 0.461 M, and 0.478 M, respectively. As determined from other crosses, the map length of the whole linkage group was ~0.85 M. The sample size was 73 individuals, and in many individuals some markers were scored as missing.

With the ML analysis, the log-likelihood ratio statistic was appreciable only close to the marker G13-750 (Fig 2A). At this location the inferred effect was an excess of the heterozygous genotype of ~0.2 over the Mendelian value of 0.5. For the Bayesian MCMC analysis, the prior distribution was the same as for the simulation study. The posterior probabilities of zero, one, two, and three SDL were 0.01, 0.15, 0.61, and 0.23, respectively. This result is, however, quite sensitive to the prior distribution of SDL number. We report the posterior distributions of both one and two inferred SDL. If a single SDL was inferred, it was most often placed close to marker C02-680 (the beginning of the marker region), and the inferred effect was a considerable increase in the second genotype, as in the ML analysis (Fig 2B). If two SDL were inferred, most often location and effect of one of the SDL was similar to the single-SDL case, while the other counteracted its effect at the other end of the linkage group (Fig 2C).



View larger version (21K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Pine data. The notation is the same as in Fig 1. The ML result is presented in A, and the posterior distribution of the single-SDL case is in B and of the two-SDL cases in C. The marker loci are (from left to right) C02-680, G13-750, K09-750, E09-250, and AC15-270.


*  DISCUSSION
*TOP
*ABSTRACT
*THEORY
*APPLICATIONS
*DISCUSSION
*LITERATURE CITED

Herein, a method for mapping SDL in a backcross is presented. The method makes efficient use of a map of partially or fully informative marker loci by using the multipoint method (LANDER and GREEN 1987 Down; JIANG and ZENG 1997 Down). A maximum-likelihood analysis via an EM algorithm as well as a Markov chain Monte Carlo Bayesian analysis using a reversible jump algorithm for varying the number of loci is presented in detail. Given a dense marker map, the method can be used for precision analysis of positions and effects of the SDL. The best previously available methods (FU and RITLAND 1994B Down; MITCHELL-OLDS 1995 Down; CHENG et al. 1996 Down) rely on fully informative markers flanking the putative SDL and assume just one SDL per chromosome.

With our approach, it is possible to efficiently analyze the number, positions, and effects of SDL in organisms, for which a high-resolution marker map has been developed and where inbred line crosses can be performed easily. Analysis can be extended easily to a general full-sib family or to the selfing of an outcrossing individual: the dimension changes from two to four, binomial distributions change to multinomial distributions, and the transition probabilities between adjacent loci change. Marker information now contributes to the full or partial identification of four combinations of genotypic configurations. As with the BC case, partial marker information can be defined as the union of compatible cases. All the above changes are rather trivial consequences of the change in dimension but complicate presentation substantially. Additionally, the missing phase information needs to be considered. Furthermore, the multipointing algorithm becomes more important for the full-sib design.

Presently, our method for the backcross can only be used to analyze the SDL currently segregating in the two lines, not those that have been segregating in the ancestral population from which the inbred lines derived. Segregation distortion might have already affected the inbreeding process for creation of the lines. Extrapolation from the current to the ancestral situation is therefore problematic. This problem is even more pressing for recombinant inbred lines, where overrepresention of chromosomal fragments of one or the other parent is commonly observed (e.g., LISTER and DEAN 1993 Down) and requires a more elaborate approach.

A distinction needs to be made between segregation distortion before and after fertilization. An SDL acting before fertilization can only alter gametic proportions. Thus genotypic proportions will only be altered indirectly through the combination of gametic proportions, which restricts the achievable combinations of genotypic proportions. On the other hand, SDL acting after fertilization may alter genotypic proportions directly. Thus, many more combinations of genotypic proportions are possible for SDL acting after fertilization. In experimental crosses more complex than the backcross design, inferred genotypic proportions of an SDL may thus render unlikely prefertilization mechanisms of segregation distortion. Two or more SDL acting before fertilization may, however, mimic the effect of SDL acting after fertilization because of the increase in combinatorial possibilities.

In hybrids of species or subspecies, segregation distortion commonly occurs (see, e.g., WHITKUS 1998 Down and references therein). This may be caused by structural rearrangements, e.g., inversions, which constitute a prefertilization mechanism. Alternatively, the segregation distortion may be caused by postfertilization differences in viability between genotypic configurations, most probably caused by epistatic interactions. Our method can be used to detect chromosomal areas that are causing these distortions. But because of the presumed epistasis, relaxation of the assumption of a multiplicative effect of different SDL may be necessary.

Our method may also be used to map loci influencing early viability. This would enhance our understanding of the nature of early inbreeding depression. The method provides another approach for estimating the number and effects of loci causing inbreeding depression. Traditionally, such information has been derived mainly from biometric analysis of crosses (e.g., DUDASH and CARR 1998 Down). But as inbreeding depression can be expressed in embryonic life stages not amenable to biometric analysis, application of this method is limited. To gain insight on these early life stages, sparse maps and single-marker methods have been used to infer the effect of a viability locus influencing inbreeding depression (SORENSEN 1967 Down; SERVITOVA and CETL 1984 Down; HEDRICK and MUONA 1990 Down; FU and RITLAND 1994A Down; KARKKAINEN et al. 1999 Down). With single-marker analysis, estimation of position and effect of the SDL is, however, confounded and multiple SDL on a single linkage group cannot be handled at all. Interval methods (FU and RITLAND 1994B Down; MITCHELL-OLDS 1995 Down; CHENG et al. 1996 Down) rely on fully informative markers flanking the putative SDL and assume just one SDL per chromosome. Dense linkage maps of fully informative markers may be hard to obtain in closely related individuals that need to be considered in the analysis of inbreeding depression. Like the interval methods, our method requires a dense linkage map of polymorphic markers but is not restricted to fully informative markers; instead it can make efficient use of, e.g., dominant markers.

Only rarely have data sets been gathered for mapping segregation distortion or viability selection (see, however, HARUSHIMA et al. 1996 Down and KUANG et al. 1998 Down). But often in QTL experiments, wide crosses are used to increase differences between parents and thus the power of mapping. Probably for this reason, markers with segregation ratio distortions are commonly observed in data sets used for QTL mapping resulting from wide crosses (e.g., VAN OOIJEN et al. 1994 Down). Segregation ratio distortion is also commonly observed in doubled haploid lines (e.g., FULTON et al. 1997 Down).

Usually generation of a linkage map of marker loci precedes QTL analysis. If a dense map of informative markers is inferred correctly, the bias introduced by segregation distortion into QTL analysis will be negligible. But if recombination fractions or, worse, order of marker loci are inferred incorrectly, basic assumptions of QTL analysis do not hold and results will be imprecise at best. Hence, aside from being interesting in themselves, SDL cause practical problems in QTL projects as observed, e.g., by SANDBRINK et al. 1995 Down. Thus, segregation distortion should be accounted for in mapping projects.

Segregation distortion is known to bias estimation of recombination fractions in two-point inference of recombination distances between markers (LORIEUX et al. 1995A Down, LORIEUX et al. 1995B Down; LIU 1998 Down). If markers are fully informative, estimation of the recombination fraction of only the markers flanking the SDL will be affected. Only in the unlikely case of coincidence of SDL and marker location will no bias be observed. If less than fully informative markers are used, the effects of the distortion are spread out to the smallest interval of fully informative markers flanking the distorted region. As a remedy, markers that show obvious segregation distortion are often excluded from the map. But that reduces coverage of the genome and qualitative or quantitative trait loci might be missed.

Our method can be extended to allow for detection of SDL concurrently with estimation of a linkage. CHENG et al. 1996 Down have already developed an EM algorithm to infer positions of two fully informative markers in the presence of a single SDL (an interval method) in a backcross or doubled haploid lines. This could be extended to a multipoint inference of a marker map in the presence of SDL by augmenting the EM or MCMC schemes presented herein by allowing the markers to change their positions relative to each other.

The source code for a C++ program and executables for a Sun workstation, with which the above calculations can be performed, are available from Claus Vogl (claus{at}genetics.ucr.edu).


*  ACKNOWLEDGMENTS

We thank Päivi Hurme and Outi Savolainen for the data set and Elja Arjas, Anita de Haan, Mikko Sillanpää, and Nengjun Yi for discussion of this and related issues. Outi Savolainen, Elja Arjas, and Lori Weingartner have commented on earlier versions of this manuscript. We thank Zhao-Bang Zeng and two anonymous reviewers for their patient work, which helped to improve this article a lot. This work was supported by grants from the Environment and Natural Resources Research Council and the Medical Research Council to Outi Savolainen and by the National Institutes of Health Grant GM-55321 and the U.S. Department of Agriculture National Research Initiative Competitive Grants Program 97-35205-5075 to S.X.

Manuscript received July 13, 1998; Accepted for publication April 3, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*THEORY
*APPLICATIONS
*DISCUSSION
*LITERATURE CITED

CHARLESWORTH, B. and D. CHARLESWORTH, 1987  Inbreeding depression and its evolutionary consequences. Annu. Rev. Ecol. Syst. 18:237-268.

CHARLESWORTH, B. and D. CHARLESWORTH, 1998  Some evolutionary consequences of deleterious mutations. Genetica 102(103):3-19.

CHENG, R., A. SAITO, and Y. UKAI, 1996  Estimation of the position and effect of a lethal factor locus on a molecular marker linkage map. Theor. Appl. Genet. 93:494-502.

DUDASH, M. W. and D. E. CARR, 1998  Genetics underlying inbreeding depression in Mimulus with contrasting mating systems. Nature 393:682-684.

FU, Y.-B. and K. RITLAND, 1994a  Evidence for the partial dominance of viability genes contributing to inbreeding depression in Mimulus guttatus.. Genetics 136:323-331[Abstract].

FU, Y.-B. and K. RITLAND, 1994b  On estimating the linkage of marker genes to viability genes controlling inbreeding depression. Theor. Appl. Genet. 88:925-932.

FULTON, T.-M., J. C. NELSON, and S. D. TANKSLEY, 1997  Introgression and DNA marker analysis of Lycopersicum peruvianum, a wild relative of the cultivated tomato, into Lycopersicum esculentum, followed through three successive backcross generations. Theor. Appl. Genet. 95:895-902.

GELMAN, A., J. B. CARLIN, H. S. STERN and D. B. RUBIN, 1995 Bayesian Data Analysis. Chapman and Hall, London.

GRANT, V., 1975 Genetics of Flowering Plants. Columbia University Press, New York.

GREEN, P. J., 1995  Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711-732[Abstract/Free Full Text].

HARTL, D. L., and A. G. CLARK, 1997 Principles of Population Genetics, Ed. 3. Sinauer, Sunderland, MA.

HARUSHIMA, Y., N. KURATA, M. YANO, Y. NAGAMURA, and T. SASAKI et al., 1996  Detection of segregation distortions in an indica-japonica rice cross using a high-resolution molecular map. Theor. Appl. Genet. 92:145-150.

HEATH, S. C., 1997  Markov-chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:748-760[Medline].

HEDRICK, P. W., 1994  Purging inbreeding depression and the probability of extinction: full-sib families. Heredity 73:363-372.

HEDRICK, P. W. and O. MUONA, 1990  Linkage of viability genes to marker loci in selfing organisms. Heredity 64:67-72.

HURME, P. and O. SAVOLAINEN, 1999  Comparison of homology and linkage of RAPD markers between individual trees of Scots pine (Pinus sylvestris L.). Mol. Ecol. 8:15-22.

HUSBAND, B. C. and D. W. SCHEMSKE, 1996  Evolution of magnitude and timing of inbreeding depression in plants. Evolution 50:554-570.

JANSEN, R. C. and P. STAM, 1994  High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455[Abstract].

JIANG, J. and Z.-B. ZENG, 1997  Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101:47-58[Medline].

RKKÄINEN, K., V. KOSKI, and O. SAVOLAINEN, 1996  Geographical variation in inbreeding depression in Scots pine. Evolution 50:111-119.

RKKÄINEN, K., H. KUITTINEN, R. VAN TREUREN, C. VOGL, and O. SAVOLAINEN, 1999  Genetic basis of inbreeding depression in Arabis petrea.. Evolution 53:1354-1365.

KUANG, H., T. E. RICHARDSON, S. D. CARSON, and B. C. BONGARTEN, 1998  An allele responsible for seedling death in Pinus radiata D. Don. Theor. Appl. Genet. 96:640-644.

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199[Abstract/Free Full Text].

LANDER, E. S. and P. GREEN, 1987  Construction of multilocus genetic maps in humans. Proc. Natl. Acad. Sci. USA 84:2363-2367[Abstract/Free Full Text].

LAUNEY, S., and D. HEDGECOCK, 1999 Genetic load causes segregation ratio distortion in oysters: mapping at 6 hours. Plant and Animal Genome VII, abstracts W14, p. 33.

LISTER, C. and C. DEAN, 1993  Recombinant inbred lines for mapping RFLP and phenotypic markers in Arabidopsis thaliana.. Plant J. 4:745-750.

LIU, B. H., 1998 Statistical Genomics: Linkage, Mapping, and QTL Analysis. CRC Press, Boca Raton, FL.

LORIEUX, M., B. GOFFINET, X. PERRIER, D. GONZÁLEZ DE LEÓN, and C. LANAUD, 1995a  Maximum likelihood models for mapping genetic markers showing segregegation distortion. 1. Backcross populations. Theor. Appl. Genet. 90:73-80.

LORIEUX, M., X. PERRIER, B. GOFFINET, C. LANAUD, and D. GONZÁLEZ DE LEÓN, 1995b  Maximum likelihood models for mapping genetic markers showing segregegation distortion. 2. F2-populations. Theor. Appl. Genet. 90:81-89.

MCCOLDRICK, D. J. and D. HEDGECOCK, 1997  Fixation, segregation and linkage of allozyme loci in inbred families of the Pacific oyster Crassostrea giga (Thunberg): implications for the causes of inbreeding depression. Genetics 146:321-334[Abstract].

MITCHELL-OLDS, T., 1995  Interval mapping of viability loci causing heterosis in Arabidopsis. Genetics 140:1105-1109[Abstract].

RICHARDSON, S. and P. J. GREEN, 1997  On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. B 59:731-792.

SANDBRINK, J. M., J. W. VAN OIJEN, C. C. PURIMAHUA, M. VRIELINK, and R. VERKERK et al., 1995  Localization of genes for bacterial resistance in Lycopersicon peruvianum using RFLPs. Theor. Appl. Genet. 90:444-450.

SATAGOPAN, R. J., and B. S. YANDELL, 1996 Estimating the number of quantitative trait loci via Bayesian model determination. Special Contributed Paper Session on Genetic Analysis of Quantitative Traits and Complex Diseases. Biometric Section, Statistical Meeting. Chicago, IL.

SERVITOVÁ, J. and I. CETL, 1984  The use of recessive lethal chlorophyll mutants for linkage mapping of Arabidopsis thaliana (L.) Heynh. Arabidopsis Inf. Serv. 21:59-64.

SILLANPÄÄ, M. and E. ARJAS, 1998  Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148:1373-1388[Abstract/Free Full Text].

SORENSEN, F. C., 1967  Linkage between marker genes and embryonic lethal factors may cause distrubed segregation rations. Silvae Genet. 16:132-134.

STEPHENS, D. A. and R. D. FISCH, 1998  Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54:1334-1347.

VAN OOIJEN, J. W., J. M. SANDBRINK, M. VRIELINK, R. VERKERK, and P. ZABEL et al., 1994  An RFLP linkage map of Lycopersicum peruvianum.. Theor. Appl. Genet. 89:1007-1013.

WHITKUS, R., 1998  Genetics of adaptive radiation in Hawaiian and Cook Island species of Tetramolopium (Asteraceae). II. Genetic linkage map and its implications for interspecific breeding barriers. Genetics 150:1209-1216[Abstract/Free Full Text].

WILLIAMS, C. G. and O. SAVOLAINEN, 1996  Inbreeding depression in conifers implications for breeding strategy. For. Sci. 42:102-117.

ZENG, Z.-B., 1994  Precision mapping of quantitative trait loci. Genetics 136:1457-1468[Abstract].




This article has been cited by other articles:


Home page
GeneticsHome page
O. Niehuis, A. K. Judson, and J. Gadau
Cytonuclear Genic Incompatibilities Cause Increased Mortality in Male F2 Hybrids of Nasonia giraulti and N. vitripennis
Genetics, January 1, 2008; 178(1): 413 - 426.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. J. Sillanpaa and F. Hoti
Mapping Quantitative Trait Loci From a Single-Tail Sample of the Phenotype Distribution Including Survival Data
Genetics, December 1, 2007; 177(4): 2361 - 2377.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. M. Rogers, N. Isabel, and L. Bernatchez
Linkage Maps of the dwarf and Normal Lake Whitefish (Coregonus clupeaformis) Species Complex and Their Hybrids Reveal the Genetic Architecture of Population Divergence
Genetics, January 1, 2007; 175(1): 375 - 398.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Bouck, R. Peeler, M. L. Arnold, and S. R. Wessler
Genetic Mapping of Species Boundaries in Louisiana Irises Using IRRE Retrotransposon Display Markers
Genetics, November 1, 2005; 171(3): 1289 - 1303.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. C. Hall and J. H. Willis
Transmission Ratio Distortion in Intraspecific Hybrids of Mimulus guttatus: Implications for Genomic Divergence
Genetics, May 1, 2005; 170(1): 375 - 386.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. R. Anderson, J. R. Schneider, P. R. Grimstad, and D. W. Severson
Quantitative Genetics of Vector Competence for La Crosse Virus and Body Size in Ochlerotatus hendersoni and Ochlerotatus triseriatus Interspecific Hybrids
Genetics, March 1, 2005; 169(3): 1529 - 1539.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Kuittinen, A. A. de Haan, C. Vogl, S. Oikarinen, J. Leppala, M. Koch, T. Mitchell-Olds, C. H. Langley, and O. Savolainen
Comparing the Linkage Maps of the Close Relatives Arabidopsis lyrata and A. thaliana
Genetics, November 1, 2004; 168(3): 1575 - 1584.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. J. Sillanpaa, D. Gasbarra, and E. Arjas
Comment on "On the Metropolis-Hastings Acceptance Probability to Add or Drop a Quantitative Trait Locus in Markov Chain Monte Carlo-Based Bayesian Analyses"
Genetics, June 1, 2004; 167(2): 1037 - 1037.
[Full Text] [PDF]


Home page
GeneticsHome page
A. A. Myburg, C. Vogl, A. R. Griffin, R. R. Sederoff, and R. W. Whetten
Genetics of Postzygotic Isolation in Eucalyptus: Whole-Genome Analysis of Barriers to Introgression in a Wide Interspecific Cross of Eucalyptus grandis and E. globulus
Genetics, March 1, 2004; 166(3): 1405 - 1418.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. R. Zenger, L. M. McKenzie, and D. W. Cooper
The First Comprehensive Genetic Linkage Map of a Marsupial: The Tammar Wallaby (Macropus eugenii)
Genetics, September 1, 2002; 162(1): 321 - 330.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. Fishman, A. J. Kelly, E. Morgan, and J. H. Willis
A Genetic Map in the Mimulus guttatus Species Complex Reveals Transmission Ratio Distortion due to Heterospecific Interactions
Genetics, December 1, 2001; 159(4): 1701 - 1716.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. G. Williams, Y. Zhou, and S. E. Hall
A Chromosomal Region Promoting Outcrossing in a Conifer
Genetics, November 1, 2001; 159(3): 1283 - 1289.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Y. Harushima, M. Nakagahra, M. Yano, T. Sasaki, and N. Kurata
A Genome-Wide Survey of Reproductive Barriers in an Intraspecific Hybrid
Genetics, October 1, 2001; 159(2): 883 - 892.
[Abstract] [Full Text] [PDF]


This Article
Right arrow