- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kitakado, T.
- Articles by Kishino, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kitakado, T.
- Articles by Kishino, H.
Genetics, Vol. 173, 2063-2072, August 2006, Copyright © 2006
doi:10.1534/genetics.106.056424
Simultaneous Estimation of Mixing Rates and Genetic Drift Under Successive Sampling of Genetic Markers With Application to the Mud Crab (Scylla paramamosain) in Japan
Toshihide Kitakado*,1,
Shuichi Kitada*,
Yasuhiro Obata
and
Hirohisa Kishino
* Faculty of Marine Science, Tokyo University of Marine Science and Technology, Minato, Tokyo 108-8477, Japan,
Tamano Station, National Center for Stock Enhancement, Fisheries Research Agency, Chikko, Tamano, Okayama 706-0002, Japan and
Graduate School of Agriculture and Life Sciences, University of Tokyo, Bunkyo, Tokyo 113-8657, Japan
1 Corresponding author: Tokyo University of Marine Science and Technology, 5-7, Konan 4, Minato-ku, Tokyo, 108-8477, Japan.
E-mail: kitakado{at}s.kaiyodai.ac.jp
>ABSTRACT
MODELS AND METHODS
SIMULATION STUDIES
CASE STUDY: ANALYSIS FOR...
DISCUSSION
APPENDIX A: MCMC SAMPLING
APPENDIX B: MCEM ALGORITHM
ACKNOWLEDGEMENTS
LITERATURE CITED
In stock enhancement programs, it is important to assess mixing rates of released individuals in stocks. For this purpose, genetic stock identification has been applied. The allele frequencies in a composite population are expressed as a mixture of the allele frequencies in the natural and released populations. The estimation of mixing rates is possible, under successive sampling from the composite population, on the basis of temporal changes in allele frequencies. The allele frequencies in the natural population may be estimated from those of the composite population in the preceding year. However, it should be noted that these frequencies can vary between generations due to genetic drift. In this article, we develop a new method for simultaneous estimation of mixing rates and genetic drift in a stock enhancement program. Numerical simulation shows that our procedure estimates the mixing rate with little bias. Although the genetic drift is underestimated when the amount of information is small, reduction of the bias is possible by analyzing multiple unlinked loci. The method was applied to real data on mud crab stocking, and the result showed a yearly variation in the mixing rate.
IT is well known that many fisheries stocks have been reduced because of overfishing, habitat loss, and degradation. (PAULY et al. 2002). Some management procedures, such as setting a total allowable catch (TAC), are the usual practices for sustainable use of such fisheries stocks (e.g., HILBORN and WALTERS 1992). In addition to such management procedures, stock enhancement programs through release of hatchery-reared juveniles are used to supplement reduced natural stocks (BLAXTER 2000; SVÅSAND et al. 2000; LEBER et al. 2004; KITADA and KISHINO 2006).
In stock enhancement programs, the assessment of the contribution of the released population to the commercial catch is one of the important tasks (KITADA and KISHINO 2006). Also, it is necessary to monitor the genetic impact of release on a natural population. Evaluation of the mixing conditions of a released population into a natural population must then be carried out. However, if juveniles are too small to tag and/or the tag-shedding rate is significant, it is difficult to determine their mixing compositions through experiments with artificial tags. For such cases, there is the possibility of stock identification using genetic tags such as mtDNA haplotypes and microsatellite alleles (e.g., LEBER et al. 2004).
We now consider a fisheries stock of a random-mating population in a closed fishing ground. Here, we call this population a natural population. Suppose that in each year hatchery-reared juveniles are released before the fishing season. Then, a population after the release of the juveniles composes a composite population, which consists of two source populations, that is, the natural and supplemented populations. The composite population could in turn be naturally reproducing in the next year. We call it a natural population again.
The haplotype or allele frequencies in a population in the fishing ground change temporally for two reasons: the release of juveniles and the genetic drift. The amount of genetic drift depends on the effective population size (Ne) (NEI and TAJIMA 1981; WAPLES 1989). The methods for estimating Ne on the basis of temporal changes in haplotype or allele frequencies have been developed. WAPLES (1989) introduced a moment-based estimation method using the standardized variance in allele frequency change. More efficient estimation by the maximum-likelihood method was introduced by WILLIAMSON and SLATKIN (1999) and ANDERSON et al. (2000). Furthermore, ANDERSON (2005) presented a coalescent-based likelihood method for temporally sampled individuals.
Such temporal change in sampled haplotype or allele frequencies can also be used for the estimation of mixing rates. Typically, the estimation of mixing rates in a mixed population needs samples from different baseline populations as well as the composite population. For such situations, the maximum-likelihood (ML) and Bayesian methods were developed (e.g., MILLAR 1987; PELLA and MASUDA 2001). Another situation occurs in stock enhancement programs. In this case, one baseline population is a natural population before release, and the other is a released population. A mixed population is a composite population after release. Usually, sampling from the composite population is conducted in the fishing season, and the composite population in the previous year acts as a baseline. In this sense, a composite population in one year plays two roles: as a mixed population for the sampling year and as a baseline natural population for the following year. This kind of method was applied to data for the mud crab, Scylla paramamosain, in Japan. IMAI et al. (2002) used the mitochondrial DNA (mtDNA) D-loop region as a genetic tag and gave a point estimate for the mixing rate of juveniles released for a year. OBATA et al. (2006) obtained point estimates of mixing rates and their confidence intervals using the maximum-likelihood method.
As mentioned above, haplotype or allele frequencies can vary due to genetic drift. Hatchery releases could be effective for depleted populations with limited recruitment. Supportive breeding (RYMAN and LAIKRE 1991) programs also target small populations of endangered species. In such cases, the effective population size could be small and the amount of genetic drift cannot be ignored. If the effect of genetic drift is large, actual haplotype or allele frequencies in the natural population, regarded as a baseline population, are different from frequencies in the natural population at release. The natural population at release should actually act as a baseline. OBATA et al. (2006) conducted a simulation for sensitivity tests to quantify the magnitude of changes in estimates and standard errors of mixing rates for various values of genetic drift assumed to be known. However, the effect of the genetic drift on the estimate of the mixing rate has not been fully evaluated and the reliability of the classical method for such situations remains unknown.
In this article, we develop a new method to simultaneously estimate mixing rates and the magnitude of the genetic drift, on the basis of successive allelic frequencies. Our statistical model includes further parameters such as frequencies in the initial population and latent vectors that represent random haplotype or allele frequencies due to genetic drift. While the complete likelihood function for such latent variables and observed data can be explicitly expressed, the likelihood function for observed data does not have a closed form. To obtain the maximum-likelihood estimate of the genetic drift, we use a Monte Carlo approximation of likelihood with a Markov chain Monte Carlo (MCMC) method (e.g., GELMAN et al. 2004; ROBERT and CASELLA 2004). Performance of the proposed method was evaluated via numerical simulations and real data of mud crab stocking in Japan were analyzed.
ABSTRACT
>MODELS AND METHODS
SIMULATION STUDIES
CASE STUDY: ANALYSIS FOR...
DISCUSSION
APPENDIX A: MCMC SAMPLING
APPENDIX B: MCEM ALGORITHM
ACKNOWLEDGEMENTS
LITERATURE CITED
Statistical model:
Let
be a vector of the true allele frequencies for a population in a fishing ground in year
, where J is the number of alleles and
. Let
denote the observed allele frequencies of samples randomly collected from this natural population in year y. Then, given p1y, the observed frequencies n1y are assumed to have a multinomial distribution,
![]() |
is twice the sample size for the diploid (the sample size itself for the haploid marker).
Suppose that juveniles are reared each year using some sampled individuals with minor alleles and then released into the natural population in the following year. In this case, the alleles used in the released population act as genetic tags in the natural population. Let
be a vector of allele frequencies for released juveniles in year
(
). Let
denote sampled counts of allele frequencies from the released population in year y with a multinomial distribution,
![]() |
.
The allele frequencies in the population in the fishing ground change due to two factors: genetic drift and the release of juveniles into the population. To represent the variation due to the genetic drift, we assume a Dirichlet distribution with a probability density function
![]() | (1) |
is a vector of allele frequencies before release of juveniles in year y, and
is a parameter that controls the amount of genetic drift; the smaller
is, the larger the amount of genetic drift, and vice versa. In fact, under the assumption of nonoverlapping generations, the drift variance of allele frequencies between generations is given by p(1 p)/2Ne, where p is an allele frequency in the previous generation and Ne is the effective population size (e.g., NEI and TAJIMA 1981; WAPLES 1989). Note that the variance for mtDNA haplotype frequencies is p(1 p)/Nf, where Nf is the effective female population size. In this model, the variance of
based on the Dirichlet distribution above is given by
![]() |
is closely related to the effective population size as 1 +
= 2Ne or 1 +
= Nf. In this article, we assume a constant
over all years.
Now, the vector of frequencies
cannot be observed because the sample is taken after the release of juveniles, and therefore it is regarded as a latent random vector. After the genetic drift among years, the vector of allele frequencies further changes because of mixing with the released juveniles, as
![]() |
y is the mixing rate of the released population into the natural population in year y. As a summary of our statistical model, a schematic diagram is shown in Figure 1.
|
Likelihood:
In this model, unknown parameters are mixing rates,
, the degree of the genetic drift,
, the vector of the true allele frequencies in a natural population in the initial year, p11, and those of the released population,
. For notational convenience, we summarize random variables as
,
,
, and
. If the latent vectors
can be observed, the likelihood function for the unknown parameters,
,
, p11, p2, based on the observations n1, n2 and the latent variables
, is explicitly expressed as follows:
![]() |
Now,
are unobservable latent variables, and hence we must consider a likelihood function based on the marginal distribution of n1 and n2 as follows:
![]() |
Estimation algorithm:
For simplicity of illustration, we assume that the allele frequencies p2 are known. In this case, the observation n2 is not used in the estimation. The method with unknown p2 is straightforward.
Let
denote the set of parameters (
,
, p11). To approximate LM(
| n1) via a Monte Carlo simulation, we use the formula
![]() |
* is a reference vector, and
is the conditional distribution of the latent variables
at
* given the observed data n1. This integration can be approximated by Monte Carlo as
![]() |
are the realization of
drawn from the conditional distribution
.
Unfortunately, a closed form of the conditional distribution
does not exist because it includes the marginal distribution LM(
* | n1). To generate the random numbers from
, we use a Metropolis-within-Gibbs sampling method (e.g., ROBERT and CASELLA 2004). Details of this algorithm are described in APPENDIX A.
In general, the precision of the approximation depends on the choice of
* as well as the size of m. One option for the determination of
* and p*11 in
* is to use the maximum-likelihood estimates by ignoring the genetic drift. In this case, no latent variables are included in the likelihood. Another option is to use a Monte Carlo version of the EM algorithm (e.g., MCLACHLAN and KRISHNAN 1997). A detailed description of this algorithm is given in APPENDIX B. It is known that convergence of the EM algorithm tends to be slow, but values of parameters that are innovated up to several steps can be used for
*, including
*.
The case of multiple unlinked loci:
Our model can be extended for the case of multiple unlinked loci by considering the composite genotype frequencies. Different loci receive independent genetic drifts, and therefore the distribution of the latent allele frequencies, which is expressed in Equation 1, can be defined locus-by-locus. On the other hand, although the composite genotype frequencies are expressed as products of the genotype frequencies at the loci, the probability distribution of observed composite genotype is not given by the product of multinomial distributions for the loci. Here, we denote by x1yil a composite genotype at the lth locus for the ith individual sampled from the composite population in year y (2
y
Y) and by
and p2yl vectors of allele frequencies at the lth locus in the natural population before release and in the released population in year y, respectively. Then the distribution of x1yi given
and p2yl is expressed as
![]() |
ABSTRACT
MODELS AND METHODS
>SIMULATION STUDIES
CASE STUDY: ANALYSIS FOR...
DISCUSSION
APPENDIX A: MCMC SAMPLING
APPENDIX B: MCEM ALGORITHM
ACKNOWLEDGEMENTS
LITERATURE CITED
Simulation scenarios:
To evaluate the estimation performance in our statistical model and the difference of estimates between considering and ignoring the genetic drift, we used simulation studies. As scenario I, the length of sampling years was fixed at 7, and the number of loci was assumed to be 1 or 5. Furthermore, as scenario II, we considered an extreme case where samples were taken for only 2 years. In this scenario, 5 or 10 loci were analyzed to cover the limited information from only 2 years. The allele frequencies in the natural population in the initial year and those in released populations were assumed to be common over loci, and the frequencies in the released populations were known. Note that, for the case of multiple loci, composite genotypes were generated from the true distribution, and then parameters were estimated using the approximated likelihood. To make the computational time shorter, we assumed a case of only three alleles. The mixing rates, of 0.1, were assumed to be the same over the years. Three levels of genetic drift,
= 10, 100, and 1000, were used. The same sample size, Ny = 500, was assumed each year. Scenarios we used in this simulation are summarized in Table 1
. The frequencies in the released populations are assumed to be known. Throughout this simulation study, we set the true values of parameters for
*. Considering the computational burden, the sample size of the MCMC was fixed at 5000. Among 5000 samples, the first 1000 samples were discarded as constituting the burn-in period, and every fifth simulation draw was kept. Therefore, the effective sample size was 800. The number of simulation replicas was 100 for each scenario. All the computation was carried out using our own program written by statistical software R (http://www.r-project.org/).
|
Simulation results:
The results are shown in Table 2 . In scenario I (Y = 7), although the estimate of
tended to be overestimated, especially for the case of L = 1 and larger values of
, estimates of the mixing rates,
's, by the model that considered the genetic drift were almost unbiased throughout the simulation. The bias of
was reduced when multiple loci (L = 5) were analyzed. On the contrary, those where the genetic drift was not considered were subject to large bias when
was small; that is, the degree of genetic drift was large. Note that the estimation performance for the mixing rates tended to be similar in both the models for
= 1000, the case of small genetic drift.
|
In scenario II (Y = 2), because of the small size of replication with respect to years, the estimation was more difficult to make than in scenario I. In fact, especially for the case of
= 1000, the estimates of
tend to be infinity. Hence, we set an upper bound, 5000, for the estimate of
. However, the simulation results suggest that the estimation of
with many loci performs well when genetic drift is large. The estimation on mixing rates while considering the genetic drift was also stable in this scenario. On the other hand, biases were observed in the estimates without considering the drift. A noteworthy fact is that the biases are smaller than those in scenario I, where samples were taken for
7 years than the 2 in scenario II. This phenomenon indicates that the large bias in scenario I was due to the impact of the genetic drift through the year. Therefore, the longer the sampling period is the more effective it is to take the genetic drift into account. To reduce the computation time for the simulation, we assumed only three alleles in our simulation studies. The allele or haplotype frequencies in the released population have a large impact on the estimation of mixing rates rather than the number of alleles. The results for our model with considering genetic drift showed that the estimation performance of mixing rates in years 4 and 7 for scenario I tended to be worse than that in other years. This is because the most common allele in the released population is also most common in the natural population before release. On the other hand, smaller standard deviations were observed in years 2 and 5, when the common allele in the released population is a rare allele in the natural population. In this way, use of rare alleles or haplotypes for the released population improves the estimation performance.
ABSTRACT
MODELS AND METHODS
SIMULATION STUDIES
>CASE STUDY: ANALYSIS FOR...
DISCUSSION
APPENDIX A: MCMC SAMPLING
APPENDIX B: MCEM ALGORITHM
ACKNOWLEDGEMENTS
LITERATURE CITED
Data:
In Japan, three mud crab species, S. paramamosain, S. serrata, and S. olivacea, are commercially important resources in local scale fisheries. Urado Bay in Kochi Prefecture is the main area for S. paramamosain fisheries. The life span of this crab is from 1.5 to 2.5 years (OBATA et al. 2006). May and June are the spawning season, and in September hatched crabs grow to commercial size and recruit into the fishing ground. The fishery is operated mainly from August to December with the highest catch in September. The commercial catch of this species has fallen since the mid-1970s from
12 tons and reached a record low of 1.6 tons in 1989. In the early 1980s, a stock enhancement program for this species started in Urado Bay, and the number of released juveniles has increased. To examine the efficacy of the release program for this species, an experiment using genetic tagging was conducted (OBATA et al. 2006). Samples were collected in Urado Bay around December in 19962002. In each year from 1996 to 2000, individuals with different haplotypes were chosen from samples to produce supplemental juveniles, and these minor haplotypes would then be used as a genetic tag. During mid-May/early June in each of the next years, juveniles were released into Urado Bay. Supplemented individuals, if not removed by fishing from the population in the year they are introduced, can contribute to the reproducing population in the second year.
We used observed haplotype frequencies in the mtDNA D-loop region of the mud crabs (Table 3 ). Here, we grouped very minor haplotypes into one category called "others." The haplotype frequencies for the released population are also shown in Table 3. Only one haplotype was released every year, except 1998. Using these data, we estimated mixing rates and the genetic drift simultaneously. Because the number of released juveniles was known, we can regard the parameters p2y as fixed. We set the burn-in period at 5000 and kept every fifth simulation draw of 250,000 samples as an effective output for Monte Carlo sampling.
|
Results:
Table 4 shows the summary of parameter estimations with and without taking the genetic drift into account. The estimates of mixing rates,
, were small except for 1997. Little differences between the estimates of the two methods were observed. This may be due to the small genetic drift. In fact, the estimated
was
341.1. Although
may be overestimated considering the simulation results, severe genetic drift was not observed.
|
Although not so much difference in the number of released juveniles was recorded except for in 2001, a large yearly variation in the estimates of mixing rates was observed. In the next section, we discuss the effects of adaptation and selection on the estimation of mixing rates.
ABSTRACT
MODELS AND METHODS
SIMULATION STUDIES
CASE STUDY: ANALYSIS FOR...
>DISCUSSION
APPENDIX A: MCMC SAMPLING
APPENDIX B: MCEM ALGORITHM
ACKNOWLEDGEMENTS
LITERATURE CITED
, based on the proposed model, was much better than that based on the model ignoring the genetic drift if the degree of genetic drift was large. Also, even if the degree of genetic drift was small, the mixing rates were well estimated.
Meanwhile, the simulation results also showed that the estimation of genetic drift is difficult compared to that of mixing rates. The genetic drift
was overestimated especially when the genetic drift was small. In fact, the standard errors for
in the case studies for mud crabs were large. This is partly due to the small number of sampling years and partly due to the use of only the mtDNA haplotype frequencies. Longer-term research would improve the estimation performance for the genetic drift and the mixing rates. In such studies, however, we must also consider the change in
or Ne through time, because the parameter
is closely related to the effective population size. Such variation may be likely in situations where juveniles are artificially released. Note that our model does not take into account effects of removal of individuals. We supposed that individuals are randomly caught by fisheries from the composite populations, and therefore change in haplotype frequencies due to fisheries catch occurs in the same manner as the genetic drift. In this respect, we think that any changes in statistical model are not essentially necessary even if considering the effect. However, we regarded such random changes as only a result of genetic drift and related it to the effective population size. If the amount of sample is large, effective population size after the fishing season may be underestimated.
The estimates of
for mud crabs in Urado Bay were 341.1. Because mtDNA haplotypes were used in the analysis here, the degree of genetic drift
is equal to the effective female population size minus one, Nf 1. If the sex ratio is 1:1, then the estimate of the effective population size is Ne = 2Nf = 684.2. Although the accuracy of this value is not high because of the large standard error for
, it seems to be small compared to the actual population size. The catch, in weight, in the late 1990s was
5 tons, and the average weight was estimated as 411 g through a market survey (OBATA et al. 2006). Hence, the catch in number can be calculated at 12,165. The mortality rate is unknown for this stock, but if the exploitation rate can be assumed to be 33% and the natural mortality is ignored, the actual population size after exploitation, N, is (1 0.33) · 5000/(0.411 · 0.33)
24,700. This value is considerably larger than the estimated Ne. However, as described in TURNER et al. (2002), the estimate of the ratio Ne/N for the red drum, Sciaenops ocellatus, was <0.001. The ratio Ne/N can take small values in nature, for example, if heterogeneity in fecundity or reproductivity among individuals or habitat is large. In this case, genetic impacts of hatchery releases on wild populations cannot be ignored. In this regard, further studies are necessary for the investigation of Ne and mixing for mud crabs in Urado Bay.
Our statistical model does not take into account selection on the juvenile mortality and fertility. For our situation, there are two possible selection forces: domestication and natural selection. Domestication is defined as genetic changes that result from different selective pressures in the hatchery and the wild environment (WAPLES and DRAKE 2004). If a hatchery population was established from the natural population and then kept in the hatchery and multiplied, the hatchery-reared stock generally becomes adapted to hatchery conditions and maladapted to natural conditions. Therefore, the mixing rate in the fishery season can be different from the one at the time of release. Our model estimates the former. Domestication may also cause a difference in fertility between the released individuals and the others in the composite population. In addition, natural selection is another factor that has an impact on changes in haplotype frequencies. In this study, we consider only the genetic drift and ignore the difference in the fertility and fitness.
If such selection in the reproduction process is mild and the effective population size is small, the effect on the estimation of mixing rates may be small. Meanwhile, if the effect of selection is relatively large, the directional changes in the frequencies may cause a bias in our estimation of mixing rate. For example, assume that minor haplotype I is used for release in one year and another haplotype II is used in the next year. Suppose that fitness of haplotype I is low. In this case, the frequency of haplotype I before release in the next year decreases due to selection. Therefore, the mixing rate of the released population with haplotype II is overestimated. If the same haplotype I is used in both years, and fitness of haplotype I stays low, then mixing rate is in turn underestimated. In this way, our estimates of mixing rates might be biased if the effect of selection is large. On the other hand, selection increases the level of genetic drift because it increases the rate of inbreeding. The result in the mud crab analysis showed a small level of genetic drift. We used wild female crabs caught in the fishing ground for juvenile production every year and duration for juveniles kept in the hatchery environment was short,
1 month or 1.5 months. Therefore, the domestication selection might be neglected for this case. Minor haplotypes are considered the consequence of long-term natural selection and evolution. Although further investigation is necessary, it may have suggested a small impact of selection on the estimation.
In this article, we described mostly a statistical model only for cases where haplotype or allele counts at a single locus are observed. Now, many microsatellites are often used in the study of population genetics, including genetic stock identification (e.g., WIRGIN and WALDMAN 2005). The extension for multiple unlinked markers is direct. Even with loosely linked markers, our statistical models may perform well. However, the effect of linkage disequilibrium can be stronger than that in usual population genetics studies.
Linkage disequilibrium exponentially decays with successive generations. Hence, in a population reaching equilibrium, the assumption of independence among loci will be appropriate. Our article, however, considers admixture of yearly released populations, and it is impossible to expect such equilibrium. Therefore, the assumption of independent loci tends to no longer hold. Statistical models under linkage disequilibrium have been developed (e.g., FALUSH et al. 2003; KITADA and KISHINO 2004). However, we think that such modeling and developing of estimation methods themselves are pretty complicated and beyond the scope of our research. Therefore, statistical models considering linkage disequilibrium as well as the mixing rates and genetic drift will be developed in future research.
Our statistical model included latent vectors representing haplotype frequencies in mtDNA due to genetic drift. The likelihood function is not expressed in a closed form, and hence direct maximization is not possible. In this article, we consider the maximum-likelihood estimation with a Monte Carlo approximation of likelihood (and Monte Carlo EM, MCEM, algorithm). Meanwhile, the maximum-likelihood estimate is not always the best because the profile likelihood has a drawback when the sample size is small compared to the number of parameters (e.g., NEYMAN and SCOTT 1948). In fact, for more precise estimation of
, eliminating the other parameters including
through integration may be efficient (KITAKADO et al. 2006, accompanying article in this issue). In this case, some noninformative prior distributions are to be assumed, and the parameters, except for
, are integrated out from the likelihood. The parameter
is estimated by maximizing the integrated likelihood, and the parameters, except for
, are inferred using their posterior distribution given data and the estimate of
. This kind of hierarchical structure is also effective in a more practical situation where there is a random or systematic change in
over years. Although in this case further hierarchical modeling is necessary, extension of our model and algorithm is not difficult for such purposes. Furthermore, if we consider the presence of locus-specific
, the assumption of random variation of
over loci could contribute to the reduction in the number of parameters as well as an improvement of estimation performance. In this case, further complicated algorithms would be necessary. These are matters to be investigated in future research.
In this article, we have developed a statistical method for estimating mixing rates of hatchery-released individuals in enhanced populations. Such mixing could also occur when the outside species or population comes from a natural population. In this sense, we note that our method under successive sampling can also be applied in conservation genetics to monitor the effect of invasion by outside populations.
ABSTRACT
MODELS AND METHODS
SIMULATION STUDIES
CASE STUDY: ANALYSIS FOR...
DISCUSSION
>APPENDIX A: MCMC SAMPLING
APPENDIX B: MCEM ALGORITHM
ACKNOWLEDGEMENTS
LITERATURE CITED
using a Metropolis-within-Gibbs sampling method. Let
be the initial values for this algorithm. Then a sequence of random vectors is generated in turn,
![]() | (A1) |
- Sample a candidate vector
from a proposal distribution
- Compute acceptance ratios defined by

- Set
as 
For example, the form of the acceptance ratio is justified by the following formula:
![]() | (A2) |
, we use a Dirichlet distribution as
![]() |
. In fact, the variance of
is given by
, and therefore the mixing condition over parameter space becomes slow if c increases, while the acceptance ratio decreases if c decreases. We chose a suitable value by trial and error. The samples in the initial stage were discarded as a burn-in period, which is usually used in MCMC to eliminate the effect of the initial values (GELMAN et al. 2004). In addition, the MCMC sequence was thinned to reduce autocorrelation (GELMAN et al. 2004). ABSTRACT
MODELS AND METHODS
SIMULATION STUDIES
CASE STUDY: ANALYSIS FOR...
DISCUSSION
APPENDIX A: MCMC SAMPLING
>APPENDIX B: MCEM ALGORITHM
ACKNOWLEDGEMENTS
LITERATURE CITED
be the logarithm of
, and let
[t] denote a value of
at the tth EM iteration. Define Q(
|
[t]) by the conditional expectation of the lc(
) with respect to
given n1,
![]() |
- E step: Compute Q(
|
[t]).
- M step: Define
[t+1] by a value of
maximizing Q(
|
[t]).
- M step: Define
In our model, the complete log-likelihood,
, can be obtained as
![]() |
. The original EM algorithm expects an explicit formula for Q(
|
[t]). This is the case at least for statistical models with simple exponential families. Unfortunately, another step is necessary for computation of Q(
|
[t]) in our model. In fact, the function Q(
|
[t]) is expressed as
![]() |
given n1 have no explicit formulas. To compute the conditional expectation of the complete log-likelihood, Q(
|
[t]), we use a Monte Carlo integration by generating simulation outcomes for
given the observable data n1. The random variables
are generated from the conditional distribution,
with an MCMC as described in APPENDIX A. The size mt at the tth step should be increased with t. Then the function Q(
|
[t]) is approximately calculated as
![]() |
ABSTRACT
MODELS AND METHODS
SIMULATION STUDIES
CASE STUDY: ANALYSIS FOR...
DISCUSSION
APPENDIX A: MCMC SAMPLING
APPENDIX B: MCEM ALGORITHM
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
MODELS AND METHODS
SIMULATION STUDIES
CASE STUDY: ANALYSIS FOR...
DISCUSSION
APPENDIX A: MCMC SAMPLING
APPENDIX B: MCEM ALGORITHM
ACKNOWLEDGEMENTS
>LITERATURE CITED
ANDERSON, E. C., 2005 An efficient Monte Carlo method for estimating Ne from temporally spaced samples using a coalescent-based likelihood. Genetics 170: 955967.
ANDERSON, E. C., E. G. WILLIAMSON and E. A. THOMPSON, 2000 Monte Carlo evaluation of the likelihood for Ne from temporally spaced samples. Genetics 156: 21092118.
BLAXTER, J. H. S., 2000 The enhancement of marine fish stocks. Adv. Mar. Biol. 38: 154.
FALUSH, D., M. STEPHENS and J. K. PRITCHARD, 2003 Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 15671587.
GELMAN, A., J. B. CARLIN, H. S. STERN and D. B. RUBIN, 2004 Bayesian Data Analysis, Ed. 2. Chapman & Hall/CRC Press, Boca Raton, FL.
HILBORN, R., and C. J. WALTERS, 1992 Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty, Chapman & Hall, New York.
IMAI, H., Y. OBATA, S. SEKIYA, T. SHIMIZU and K. NUMACHI, 2002 Mitochondrial DNA markers confirm successful stocking of mud crab juveniles, Scylla paramamosain in a natural population. Suisan Zoshoku 50: 149156.
KITADA, S., and H. KISHINO, 2004 Simultaneous detection of linkage disequilibrium and genetic differentiation of subdivided populations. Genetics 167: 20032013.
KITADA, S., and H. KISHINO, 2006 Lessons learned from Japanese marine finfish stock enhancement programmes. Fish. Res. 80: 101112.[CrossRef]
KITAKADO, T., S. KITADA, H. KISHINO and H. J. SKAUG, 2006 An integrated-likelihood method for estimating genetic differentiation between populations. Genetics 173: 20732082.
LEBER, K. M., S. KITADA and H. L. BLANKENSHIP, 2004 Stock Enhancement and Sea Ranching, Ed. 2. Blackwell Publishing, Oxford.
MCLACHLAN, G. J., and T. KRISHNAN, 1997 The EM Algorithm and Extensions. John Wiley & Sons, New York.
MILLAR, R. B., 1987 Maximum likelihood estimation of mixed stock fishery composition. Can. J. Fish. Aquat. Sci. 44: 583590.
NEI, M., and F. TAJIMA, 1981 Genetic drift and estimation of effective population size. Genetics 98: 625640.
NEYMAN, J., and E. L. SCOTT, 1948 Consistent estimates based on partially consistent observations. Econometrica 16: 132.
OBATA, Y., H. IMAI, T. KITAKADO, K. HAMASAKI and S. KITADA, 2006 The contribution of stocked mud crabs Scylla paramamosain to commercial catches, estimated using a genetic stock identification technique in Japan. Fish. Res. 80: 113121.[CrossRef]
PAULY, D., V. CHRISTENSEN, S. GUÉNETTE, T. J. PITCHER, U. R. SUMAILA et al., 2002 Towards sustainability in world fisheries. Nature 418: 689695.[CrossRef][Medline]
PELLA, J., and M. MASUDA, 2001 Bayesian methods for analysis of stock mixtures from genetic characters. Fish. Bull. 99: 151167.
ROBERT, C. P., and G. CASELLA, 2004 Monte Carlo Statistical Methods, Ed. 2. Springer-Verlag, New York.
RYMAN, N., and L. LAIKRE, 1991 Effects of supportive breeding on the genetically effective population size. Conserv. Biol. 5: 325329.[CrossRef]
SVÅSAND, T., T. S. KRISTIANSEN, T. PEDERSEN, A. G. V. SALVANES, R. ENGELSEN et al., 2000 The enhancement of cod stocks. Fish Fish. 1: 173205.
TURNER, T. F., J. P. WARES and R. G. JOHN, 2002 Genetic effective size is three orders of magnitude smaller than adult census size in an abundant, estuarine-dependent marine fish (Sciaenops ocellatus). Genetics 162: 13291339.
WAPLES, R., 1989 A generalized approach for estimating effective size from temporal changes in allele frequency. Genetics 121: 379391.
WAPLES, R., and J. DRAKE, 2004 Risk/benefit consideration for marine stock enhancement: a Pacific salmon perspective, pp. 260306 in Stock Enhancement and Sea Ranching, Ed. 2, edited by K. M. LEBER, S. KITADA and H. L. BLANKENSHIP. Blackwell Publishing, Oxford.
WILLIAMSON, E. G., and M. SLATKIN, 1999 Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152: 755761.
WIRGIN, I., and J. R. WALDMAN, 2005 Use of nuclear DNA in stock indentification: single-copy and repetitive sequence markers, pp. 331370 in Stock Identification Methods, edited by S. X. CADRIN, K. D. FRIEDLAND and J. R. WALDMAN. Elsevier Academic Press, Burlington, MA.
Communicating editor: R. W. DOERGE
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kitakado, T.
- Articles by Kishino, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kitakado, T.
- Articles by Kishino, H.

















