Genetics, Vol. 156, 2063-2079, December 2000, Copyright © 2000

Empirical Bayes Procedure for Estimating Genetic Distance Between Populations and Effective Population Size

Shuichi Kitadaa, Takeshi Hayashib, and Hirohisa Kishinob
a Department of Aquatic Biosciences, Tokyo University of Fisheries, Minato, Tokyo 108-8477, Japan
b Graduate School of Agricultural and Life Sciences, University of Tokyo, Bunkyo, Tokyo 113-8657, Japan

Corresponding author: Shuichi Kitada, Tokyo University of Fisheries, 4-5-7 Konan, Minato, Tokyo 108-8477, Japan., kitada{at}tokyo-u-fish.ac.jp (E-mail)

Communicating editor: Z-B. ZENG


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*CASE STUDIES
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We developed an empirical Bayes procedure to estimate genetic distances between populations using allele frequencies. This procedure makes it possible to describe the skewness of the genetic distance while taking full account of the uncertainty of the sample allele frequencies. Dirichlet priors of the allele frequencies are specified, and the posterior distributions of the various composite parameters are obtained by Monte Carlo simulation. To avoid overdependence on subjective priors, we adopt a hierarchical model and estimate hyperparameters by maximizing the joint marginal-likelihood function. Taking advantage of the empirical Bayesian procedure, we extend the method to estimate the effective population size using temporal changes in allele frequencies. The method is applied to data sets on red sea bream, herring, northern pike, and ayu broodstock. It is shown that overdispersion overestimates the genetic distance and underestimates the effective population size, if it is not taken into account during the analysis. The joint marginal-likelihood function also estimates the rate of gene flow into island populations.


AS a stock management tool to counteract decreased or depleted fishery resources, stock enhancement programs have been undertaken in many countries for salmonid (HILBORN and WINTON 1993 Down; RITTER 1997 Down; KAERIYAMA 1999 Down; KNAPP 1999 Down) and for other marine species (BARTLEY 1999 Down; KITADA 1999 Down). Concerns about the genetic effects of hatchery releases on wild populations have increased and aroused discussion (WALTERS 1988 Down; WAPLES 1991 Down; HILBORN 1992 Down; UTTER 1998 Down; WAPLES 1999 Down). CAMPTON 1995 Down reviewed the genetic effects of hatchery releases on natural stocks of salmon and brown trout and concluded that the empirical data supporting those arguments are absent or largely circumstantial. This is a complex topic that needs further research (WAPLES 1999 Down). A 10-point approach for a responsible stock enhancement program has been proposed, which includes the need to use genetic resource management to avoid deleterious genetic effects (BLANKENSHIP and LEBER 1995 Down). Using wild individuals as broodstock may possibly reduce genetic risks (BARTLEY et al. 1995 Down; HARADA et al. 1998 Down).

The genetic identity between produced progenies and the wild stock will be required before one can release the progenies. To examine the genetic identity, statistically significant differences are required. The homogeneity {chi}2 test of allele frequencies is commonly used for testing genetic differences and the Roff test (ROFF and BENTZEN 1989 Down) is used when minor alleles exist. If the null hypothesis is not rejected, the statistical power is required to be reported from a conservation viewpoint (PETERMAN 1990 Down; DIZON et al. 1995 Down). However, when the genetic difference is small, the corresponding statistical power may also be small with small sample sizes, making it difficult to conclude that there is no genetic difference. The statistical power is the probability of detecting the alternative hypothesis when it is correct. A considerably large sample size is required if one wants to obtain satisfactory large statistical power to reject the null hypothesis and detect small genetic differences. The hypothesis testing framework implies rejecting the null hypothesis, so it does not work well for detecting the genetic identity, and calculating the power is meaningless. Problems of the null hypothesis testing framework are discussed in COHEN 1994 Down and HAGEN 1997 Down.

An effective method of determining genetic identity is to examine the genetic distances between populations. If an estimated confidence interval of the genetic distance between two populations includes 0, we could conclude that the populations are genetically identical or not statistically significantly different. There are several measures for the genetic distance (NEI 1987 Down). However, the sample distributions of these genetic distances are unknown. It is then inappropriate to estimate the confidence intervals of the genetic distance using asymptotic variances of the estimator.

In this article, we develop a Bayesian estimating procedure to measure genetic distances between populations from allele frequencies. We can directly evaluate the probability distribution of the genetic distance from its posterior distribution. The general method developed here is extended to estimate the effective population size termed Ne in a population based on temporal changes in allele frequencies. The joint marginal-likelihood function derived here coincides with the likelihood function to estimate the rate of gene flow into island populations using the sample allele frequency from a number of islands (RANNALA and HARTIGAN 1996 Down).


*  METHODS
*TOP
*ABSTRACT
*METHODS
*CASE STUDIES
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Let the frequencies of k alleles of two populations to be compared be p11, · · · , p1k and p21, · · · , p2k. SANGHVI 1953 Down proposed that the genetic distance between two populations be determined by

(1)

We use this distance as a natural measure of the genetic distance between populations, which takes values between 0 and 4. It is known that 2n1.n2./(n1. + n2.) follows a {chi}2 distribution with degree of freedom k - 1 when p1i = p2i = pi for i = 1, · · ·, k (NEI 1987 Down), where n1. and n2. are sample sizes (individuals) of the two populations, and is the estimator obtained by substituting sample frequencies in Equation 1. However, the distribution of is unknown when p1i != p2i for i = 1, · · · , k. It is then inappropriate to evaluate the confidence interval of D using an asymptotic variance of , although it can be derived. Here we directly evaluate the posterior probability density of the genetic distance measure using a Bayesian framework.

Prior and posterior distribution of D:
It is not easy to describe a reasonable prior distribution of D, especially when we compare more than two populations. Alternatively, we set a prior for allele frequencies. Let the allele frequency of a population be p = (p1, · · · , pk)' and the sample count be n = (n1, · · · , nk)', where {Sigma}pi = 1 and {Sigma}ni = 2n (n individuals). When the sample is collected by a simple random sampling procedure with replacement, n follows a multinomial distribution. A ß distribution is known as a conjugate prior of the binomial parameter p. A Dirichlet distribution is a conjugate prior of multinomial proportions, which is an extension of a ß distribution (JOHNSON and KOTZ 1969 Down; LEE 1989 Down):

(2)

Here, {alpha} = ({alpha}1, · · · , {alpha}k)' are regarded as hyperparameters specifying the prior distribution. We use this distribution as a prior for allele frequencies.

The posterior distribution is obtained by multiplying the likelihood function, which is multinomial distribution in this case, by the prior. The posterior distribution of p is then given by

which is again a Dirichlet distribution with parameters modified by the data ni + {alpha}i (LANGE 1995 Down; WEIR 1996 Down). Given {alpha} = ({alpha}1, · · · , {alpha}k)', we can obtain a posterior distribution of p by generating Dirichlet random numbers with parameter {alpha} + n using Monte Carlo simulations. Using independent Dirichlet random numbers for posterior distributions of population allelic frequencies, we can obtain a posterior distribution of D using Equation 1. The number of each Monte Carlo simulation is set to 10,000, so 10,000 D are calculated from the 10,000 sets of p between two populations. The posterior probability density function is estimated on the basis of the histogram of D with the number of classes of 100 by using the function "density" of S language version 4 (CHAMBERS and HASTIE 1992 Down). For multilocus data, the mean of the genetic distances at J loci is calculated as D = .

Empirical Bayes procedure:
The primary disadvantage of using a Bayesian analysis for allele frequency estimation is that there is no obvious way of selecting a reasonable prior (LANGE 1995 Down). The Dirichlet distribution with {alpha}1 = · · · = {alpha}k = is a noninformative prior (BOX and TIAO 1992 Down). Here we adopt an empirical Bayes procedure to avoid dependence to priors. This procedure estimates the hyperparameters {alpha} by maximizing the marginal-likelihood function (MARITZ and LWIN 1989 Down),

which is also given in LANGE 1995 Down and WEIR 1996 Down. The distribution is known as a Dirichlet-multinomial distribution (LANGE 1995 Down; WEIR 1996 Down), which is a generalization of the ß-binomial distribution.

LANGE 1995 Down estimated the hyperparameters from single-locus data using Newton's method. Here we estimate them from multilocus data by maximizing Equation 3 using a simplex minimization for the negative logarithm of Equation 3. Assuming that {alpha} is the same for H populations (samples) to be compared and {Sigma}{alpha}i is also the same for J loci, the joint marginal likelihood is then given by

(3)

where Cjh = is a constant term for the combination of the multinomial likelihood that can be excluded from the estimation procedure.

Parameter {sigma}2 is the dispersion parameter that defines the magnitude of overdispersion; i.e., the variance of the response Y exceeds the nominal variance (MCCULLAGH and NELDER 1983 Down). For example, the expectation of the binomial random variables of a sample size m is E[Y] = mp and the variance is V[Y] = mp(1 - p). If there is overdispersion, the variance is V[Y] = {sigma}2mp(1 - p) though the expectation remains the same, where Y has a density of a ß-binomial distribution. For a multinomial event with overdispersion, the variance-covariance matrix of Y is {sigma}2 times larger than that of the multinomial distribution, where Y has a density of a Dirichlet-multinomial distribution.

JOHNSON and KOTZ 1969 Down showed that the variance-covariance matrix of a Dirichlet-multinomial distribution is times larger than that of the multinomial distribution. Hence the relationship between the dispersion parameter and the hyperparameters for a population is given by

(4)

We assume equal overdispersion effects for all loci, so the total of the hyperparameters {theta} (hereafter we use {theta} for {Sigma}{alpha}i) is the same for all loci, which gives the expression for {sigma}2 as

(5)

Here 2 is the mean number of genes of H populations given by 2 = . Given the estimate of {sigma}2, we have the estimator for {theta} as

(6)

We estimate {Sigma}Jj=1(kj - 1) + 1 free parameters numerically, including {sigma}2, which is assumed to be the same among loci, and {alpha}1j, · · · , {alpha}kj-1,j for locus j, and {alpha}kj is estimated by - {Sigma}kj-1i=1ij.

The binomial and multinomial counts are assumed to be taken by a simple random sampling, so the dispersion parameter {sigma}2 is considered to indicate the magnitude of overshooting from a simple random sample. MCCULLAGH and NELDER 1983 Down stated that "The simplest and perhaps the most common mechanism of overdispersion is clustering in the populations." KITADA et al. 1994 Down estimated the dispersion parameter for fish tag recovery data and showed that the variances of the estimated mortality rates were ~2(= 14.73) times larger than those assuming the multinomial model, which was considered to be caused by the aggregation of the tagged fish in the fishing ground. In genetic data analysis, overdispersion corresponds to the variance of an allele frequency exceeding the nominal variance of a simple random sample from a gamete pool. If there are subpopulations divided spatially in a survey area, a sample allele frequency from the area might be overdispersed even if a simple random sampling is performed. If a cluster of a genotype is taken, a sample allele frequency from a population might be also overdispersed. One can then estimate overdispersion based on several sets of allele frequencies obtained from the survey area.

Standardized genetic distance:
When allele frequencies at J loci are obtained from genetically identical populations, 2n1.n2. follows a {chi}2 distribution with a degree of freedom of {Sigma}(kj - 1) asymptotically (NEI 1987 Down). The shape of the distribution varies with the sample sizes and degree of freedom. When larger numbers of individuals are sampled, the distribution is farther from 0 even if the genetic difference is small. Suppose the case for n1. = n2. = n., the above statistics become n.{Sigma}Jj=1j and take a value proportional to the sample size. It is then not convenient to make D an index of the genetic distance.

Here we standardize D and propose a general index for the genetic distance. Performing a square root transformation to make the variance independent of the mean (SNEDECOR and COCHRAN 1967 Down) and subtracting the expected value of I (the derivation is given in the Appendix), we obtain a standardized genetic distance as

(7)

which follows a normal distribution with mean 0 and variance 0.5 under the condition of p1 = p2.

The {chi}2 distribution of assumes that 2n1 and 2n2 genes are taken by a binomial sampling from a population. For this case, {sigma}2 in the first term of Equation 7 equals 1. However, if there is overdispersion, {sigma}2 becomes active and takes a value larger than 1. If the overdispersion is neglected, the genetic distance is then overestimated and the scale of the distribution of becomes {sigma}2 times larger than that under the previously stated assumption. The dispersion parameter {sigma}2 corrects this effect.

Effective population size:
The effective population size is estimated from the temporal variation of allele frequencies in a population. Since the observed variance of the allele frequencies includes the sampling variance in addition to the genetic drift, we subtract the sampling variance when estimating the effective population size. Let us assume that we have two samples with sizes n0 and nt from the population at generations 0 and t, respectively. The empirical Bayes procedure developed here can be extended to obtain the posterior distributions of the effective population size Ne by using the posterior distribution of F-statistics calculated from the posterior distribution of allele frequencies.

The standardized variance of allele frequency change measured by F-statistics has been used to estimate Ne (KRIMBAS and TSAKAS 1971 Down; NEI and TAJIMA 1981 Down; POLLAK 1983 Down; WAPLES 1989 Down). Among F-statistics, Fk proposed by POLLAK 1983 Down is similar in form to D and is given by

(8)

For the case of multiple loci, Fk is calculated by Fk = from NEI and TAJIMA 1981 Down, where Fkj is Fk at the jth locus. Without overdispersion, Ne is estimated by

(9)

for plan I, where the sample is taken after reproduction. For plan II, where the sample is taken before reproduction, the term 1/N is eliminated, where k is the estimator obtained by substituting sample frequencies in Equation 8 is the census size for a population (WAPLES 1989 Down, Equation 11 and Equation 12).

Equation 9 assumes that 2n0 and 2nt genes are taken by a binomial sampling from the population. If there is overdispersion, Fk is overestimated, which leads to underestimation of Ne. Since the effective sample size is obtained by discounting the apparent sample size by dispersion parameters, Equation 9 is modified as

(10)


*  CASE STUDIES
*TOP
*ABSTRACT
*METHODS
*CASE STUDIES
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Red sea bream:
To evaluate genetic distances, we first analyzed the data of four populations of red sea bream (Pagrus major) from TABATA and MIZUTA 1997 Down(Table 1). From the fragment pattern of mtDNA D-loop regions with six restriction enzymes, 48 haplotypes were obtained for four wild populations. We decomposed the haplotype frequency to six allelic frequencies for each restriction enzyme and eliminated HaeIII and Msp, which showed little or no polymorphism, from the analysis.


 
View this table:
In this window
In a new window

 
Table 1. Allele frequencies of the mtDNA D-loop region from TABATA and MIZUTA 1997 Down and estimated hyperparameters for four populations of red sea bream from eastern Japan

The estimate of the total hyperparameters was 106.553 for each locus (Table 1), and the dispersion parameter was estimated at 1.80 by maximizing Equation 3. Here 2. = = 87.5 because mtDNA is a haploid. With a prior distribution specified by these parameters, we obtained the posterior distribution of D by dividing {Sigma}Dj over four loci by the number of loci (Table 2). As an example, the histogram and estimated density function of the posterior distribution of D at HinfI between Tanabe Bay and Tomogashima Channel is shown in Fig 1. D12 in Fig 2 was obtained as the mean of such four posterior genetic distances at the four loci. It should be noted that the posterior distances were overestimated including the overdispersion and the posterior distributions in Fig 2 were then overestimated. The means and SDs of D12, D13, and D14 were about two times larger than D23, D24, and D34; however, they might include the effect of the smaller sample size of population 1 (Tanabe Bay).



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 1. A histogram and the estimated density function of the posterior distribution of the genetic distance between red sea bream populations of Tanabe Bay and Tomogashima Channel for HinfI using data given in Table 1. D12 in Fig 2 was obtained as the mean of the four posterior genetic distances for the four loci.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 2. Posterior distributions of the genetic distance (D) between four populations of red sea bream using data given in Table 1.


 
View this table:
In this window
In a new window

 
Table 2. Means and SDs of the posterior distribution of the genetic distance for the red sea bream populations

The posterior distribution of the standardized genetic distance took the overdispersion and sample size difference into account. The means of I12, I13, and I14 ranged from 0.2706 to 0.4671, whereas those of I23, I24, and I34 took negative values. The SDs ranged from 0.53 to 0.58 and took almost the same values. The genetic differences with population 1(I12, I13, and I14) looked larger than the others (Table 3). However, the posterior distributions of I overlapped well with the theoretical distribution of no genetic difference (Fig 3).



View larger version (25K):
In this window
In a new window
Download PPT slide
 
Figure 3. Posterior distributions of the standardized genetic distance (I) for the red sea bream populations taking the point estimate of the dispersion parameter 1.80 (left) and the 95% lower limit 1.72 (right) into account.


 
View this table:
In this window
In a new window

 
Table 3. Means and 95% credibility regions of the posterior distribution of the standardized genetic distance for the red sea bream populations

We estimated the 95% confidence interval of the dispersion parameter to be from 1.72 to 1.88 by the likelihood-ratio test. The lower limit of the dispersion parameter corresponds to the upper limit of the genetic distance, from which we evaluate the difference. The means of the posterior distributions for the lower limit of the dispersion parameter were increased from 8 to 27% and SDs remained the same (Table 3), but the posterior distributions of I were almost the same as those for the point estimate of the overdispersion and still overlapped well with the theoretical distribution (Fig 3).

The value of 95% upper limit of the credibility region of the theoretical normal distribution of I with mean of 0 and variance of 0.5 is 1.16. All posterior means were <1.16, and the credibility regions included 0; hence we concluded that there was no genetic difference between the four populations of red sea bream. This finding agreed with the result of the original authors, who reported that the Roff test did not reject the homogeneity of the haplotype frequencies (TABATA and MIZUTA 1997 Down, p = 0.219). Nevertheless, they rejected the hypothesis by the pairwise comparison. From the results of our test, however, we argue that it was inappropriate analysis.

Herring:
Stock enhancement of herring (Clupea pallasii) has been performed in Akkeshi Bay, Hokkaido (Japan). Because the matured herrings migrate to Akkeshi Bay to spawn, they are considered to have originated from Lake Akkeshi and Akkeshi Bay. Although wild adult fish that migrated to the bay are used for artificial spawning to produce juveniles every year, it still may be important to monitor the genetic change and estimate the effective population size to maintain the wild stock.

Temporal changes in allozyme allele frequencies were obtained by combining two studies on the same loci by ANDO and OHKUBO 1997 Down and HOTTA et al. 1999 Down(Table 4). Independent samples were taken in March and April 1993. In 1996, males and females were taken separately from the sample, hence they were not independent. For the purposes of our case study, we treated the data as if they were taken independently.


 
View this table:
In this window
In a new window

 
Table 4. Temporal changes in allozyme allele frequencies and estimated hyperparameters of herring in Akkeshi Bay

The estimate of the total hyperparameters was 130.956 for each locus (Table 4), and the dispersion parameter was estimated at 2.39, with 2. = = 184. The posterior distribution of Fk estimate was calculated from each of two sets of posterior distributions of allele frequencies for 1993 and 1996 by

(11)

Fk ranged from 0.0014 to 0.0814 with the mean and SD of 0.0226 and 0.0105, respectively. The posterior distribution of Fk is shown in the left side of Fig 4.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 4. Posterior distributions of the standardized variance of allele frequency change Fk and effective population size Ne of herring for the 95% lower limit of the dispersion parameter ({sigma}2 = 1) using data in Table 4.

Most of the matured herring migrating to Akkeshi Bay to spawn are in their second year of life; the remainder are in their third year. The age composition of the spawners was surveyed and estimated at 0.9 and 0.1 for each age class by the Japan Sea-Farming Association. The expected number of generations can be used for t in the estimating equations of Ne because the expectation of the F-statistics was approximated to be linear with t as shown in WAPLES 1989 Down(p. 382). We estimated the expected number of generations at 1.48 (= ) divided by the number of years between samples by the mean age of spawners as MILLER and KAPUSCINSKI 1997 Down did. The samples were taken before reproduction, so Equation 10 was used by eliminating the term of 1/N and substituting = 210 for 2n0 and = 158 for 2nt. The posterior means of Ne estimates and 95% credibility region of Ne are given in Table 5.


 
View this table:
In this window
In a new window

 
Table 5. Means and 95% credibility regions of the posterior distribution of the effective population size of herring in Akkeshi Bay obtained using data in Table 4

The dispersion parameter was estimated at 2.39 with a 95% confidence interval from 1.00 to 7.51. From a conservation viewpoint, it is better to consider the lower limit of Ne. The lower limit of the dispersion parameter evaluates the upper limit of Fk corresponding to the lower limit of Ne. The lower limit of {sigma}2 was 1.00. Corresponding with that, no overdispersion arose and no subpopulation existed. The number of simulations with a negative value of Ne estimate was 1221 in 10,000 trials. When Fk <= [1/(2n0) - 1/(2nt)], the only feasible estimate of Ne is infinity (WAPLES 1989 Down). The mean of the positive Ne estimate was 350, and the 95% credibility region was estimated from 20 to infinity (Table 5). The posterior distribution of the positive Ne estimate is shown in the right side of Fig 4; it suggests the order of the effective population size of herring, even though the upper limit was not estimated.

Northern pike:
We analyzed the data from MILLER and KAPUSCINSKI 1997 Down for comparison. Temporal changes in microsatellite allele frequency at seven loci were typed in the northern pike (Esox lucius) population of Lake Escanaba, Wisconsin. Of the seven loci, five had two alleles and two had three alleles. Following the data processing techniques of WILLIAMSON and SLATKIN 1999 Down, we also combined the two least common allelic classes for the two loci with three alleles and created a diallelic frequency set. To estimate the hyperparameters, we allocated sample sizes of 86 for 1961, 110 for 1977, and 72 for 1993 according to the diallelic frequencies for each locus to obtain the number of individuals corresponding to the frequencies. Because the data had one sample for each year, it was not possible to estimate the hyperparameters for each locus. Therefore, we assumed that the seven loci were independent of each other and had common hyperparameters. The two common hyperparameters and the dispersion parameter were estimated at 16.232, 3.089, and 9.74, with 2. = = 178.7.

Fk between the year of 1977 and 1993 ranged from 0.0087 to 0.1264 with the mean and SD being 0.0479 and 0.0150, respectively. The posterior distribution of Fk is given in the left side of Fig 5. The posterior distribution of the Ne estimate was obtained by substituting the posterior distribution of Fk into Equation 10, eliminating the term 1/N, with t = 4, as given in MILLER and KAPUSCINSKI 1997 Down. The mean of the positive Ne estimate was 73 and a 95% credibility region neglecting the overdispersion was estimated from 29 to 190 (Table 6). The number of negative Ne estimates was 6 in 10,000 trials. MILLER and KAPUSCINSKI's estimates for 1977 and 1993 data were 0.038 for Fk and 72 for Ne with a 95% confidence interval from 17 to 258. Our posterior mean of Fk was 1.26 times larger and that of the Ne estimate coincided with a 67% narrower confidence interval (Table 6).



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 5. Posterior distributions of the standardized variance of allele frequency change Fk and effective population size Ne of northern pike for the 95% lower limit of the dispersion parameter ({sigma}2 = 5.51) and that with no overdispersion ({sigma}2 = 1.00) using 1977–1993 data from MILLER and KAPUSCINSKI 1997 Down.


 
View this table:
In this window
In a new window

 
Table 6. Means and 95% credibility regions of the posterior distribution of the effective population size of northern pike obtained using 1977–1993 data from MILLER and KAPUSCINSKI 1997 Down

The 95% confidence interval of the dispersion parameter was estimated from 5.51 to 18.80. For the 95% lower limit of the dispersion parameter, the number of simulations with a negative value of Ne estimate was 8480 in 10,000 trials. The mean of the positive Ne estimates was 1065 and the 95% credibility region was estimated from 123 to infinity (Table 6). The posterior distribution of the positive Ne estimate, neglecting the overdispersion ({sigma}2 = 1), and for the lower limit of 2(= 5.51) are given in the right side of Fig 5. In this example, we can see the effect of the overdispersion on the posterior distribution of Ne estimate. The estimate of Ne, neglecting the overdispersion, agreed well with the estimate of MILLER and KAPUSCINSKI 1997 Down.

Ayu broodstock:
Ayu (Plecoglossus altivelis) is the most popular target species of recreational anglers in rivers and streams in Japan. A total of 300 million juveniles are released every year, of which hatchery-produced fish comprise ~30%. The life span of ayu is 1 year. They spawn in a river from September to November and die after spawning. Hatched larvae go down to the sea and winter there. The upstream run of wild ayu juveniles begins from the coast in late March to early April and is over by early July. Soon after, they mature, spawn from September to November, and then die after spawning.

Hatcheries have commonly cultured broodstocks over generations. At the Gunma Prefecture Fisheries Experimental Station, adult ayu have been cultured over 27 generations. About 3000–4000 fish have been reared every year as broodstock, some of which are used for artificial fertilization. In 1996, ~850 females and 650 males were used. The temporal changes in allozyme allele frequencies of the ayu broodstock were reported by YOSHIZAWA 1997 Down(Table 7). These samples in Table 7 were taken after artificial fertilization from two rearing tanks in which males and females were kept separately.


 
View this table:
In this window
In a new window

 
Table 7. Temporal changes in allozyme allele frequencies from YOSHIZAWA 1997 Down and estimated hyperparameters of ayu broodstock cultured over 27 generations in Gunma Prefecture

The total of the hyperparameters was estimated at 28.576 for each locus (Table 7), and the dispersion parameter was estimated at 5.86, with 2. = = 143. The 95% confidence interval of the dispersion parameter was estimated to range from 3.04 to 13.49. We calculated four Fk's on the basis of temporal changes in allele frequencies observed in the first (1996–1997; F1) and second (1997–1998) time intervals (F2), over the entire interval (1996–1998; F3), and for the entire interval based on the pooled F for the first two intervals, as MILLER and KAPUSCINSKI 1997 Down did. For the last case, MILLER and KAPUSCINSKI 1997 Down used the sum of F1 and F2 for the entire interval, but we used the mean for the two intervals (Fmean), which gives the same form of Ne estimate by Equation 10, substituting n0 by the harmonic mean of the sample size of the first and second year, and nt by that of the second and third year. When t is equal for the two intervals, the estimate of Ne derived from pooled Fk is the harmonic mean for both sampling periods (NEI and TAJIMA 1981 Down; POLLAK 1983 Down; WAPLES 1989 Down; MILLER and KAPUSCINSKI 1997 Down; WILLIAMSON and SLATKIN 1999 Down).

The values of Fk calculated by Equation 11 were varied using four combinations for two samples in each sampling year, reflecting the variation in allele frequencies for each sampling period. The posterior mean and SD of F3 were the largest, and the SD of Fmean was the smallest but almost the same as that of F1 (Table 8). The posterior distributions of Fk are illustrated in the left side of Fig 6.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 6. Posterior distributions for the four estimators of the standardized variance of allele frequency change Fk and effective population size Ne of ayu broodstock using 1996–1998 data in Table 7.


 
View this table:
In this window
In a new window

 
Table 8. Means and 95% credibility regions of the posterior distribution of the effective population size of ayu for 95% lower limit of the dispersion parameter (3.04) obtained using data in Table 7

We fixed the generation time at t = 1.0 for the first and second time intervals and had t = 2.0 for the entire interval because the life span of ayu is 1 year. The samples were taken after reproduction, so we used Equation 10, substituting 2n0 = = 200, 2nt = = 100, and N = 1500, which was the total number of individuals used for artificial fertilization.

We made four estimates of Ne on the basis of F1, F2, F3, and Fmean. Estimates for the 95% lower limit of the dispersion parameter (= 3.04) are given in Table 8. The posterior mean obtained using F3 was the largest with the largest SD, and that for Fmean was second with a smaller SD. The Ne estimate that took {infty} in 10,000 simulations was 8677 when using Fmean, and that for F3 was 4927. This is because the sampling correction term in the denominator of Equation 10 took the very similar values of 0.0912 for F3 and 0.0928 for Fmean, despite the posterior mean of Fmean being smaller than that of F3, as shown in the left side of Fig 6. The posterior distributions of the positive Ne estimate obtained by using F1, F2, F3, and Fmean for the lower limit of 2 are given in the right side of Fig 6, showing a larger variance of the estimate based on F3 than those derived from F1, F2, and Fmean.

We failed to estimate the upper limit of the credibility regions because of large sampling variances with overdispersion. However, when the numbers of breeding males Nm and females Nf are given, which is difficult to know in a wild population but possible in hatcheries, the effective population size is obtained by Ne = (WRIGHT 1931 Down). We obtained a value for Ne of 1473 by using the equation (4 x 850 x 650/(850 + 650)), which referred to the effective population size where a random mating was performed by artificial fertilization. In a spawning season of ayu, males and females eligible for spawning were selected every day from the broodstock and used for artificial fertilization. The number of females used in an artificial fertilization ranged from ~10 to 20, and the ratio of males to females was ~0.8. The eggs were squeezed from the females and stocked in a stainless bowl and then fertilized by squeezing sperm from individual males. This method of fertilization might not guarantee a random mating of the males and females used; hence, 1473 should be used as the upper limit of the credibility regions instead of {infty} (Table 8). If we neglect overdispersion, the 90% credibility region could be obtained at [13–589] with the posterior mean of 136, which was underestimated.


*  DISCUSSION
*TOP
*ABSTRACT
*METHODS
*CASE STUDIES
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The empirical Bayes procedure developed here makes it possible to describe the skewness of the genetic distance and evaluate genetic differentiation between populations while taking full account of the uncertainty of the sample allele frequencies. When we compare populations in which the genetic differentiation is small, the hypothesis-testing framework cannot accept the null hypothesis of no genetic differentiation in almost all cases, because of the poor statistical power with relatively small sample sizes. The empirical Bayes procedure is effective even in such cases. So we believe it could play an important role in the field of conservation.

This general method can easily be extended to any parameter that is a function of multinomial frequencies. When the parameter of interest is a function of allele frequencies, the posterior distribution of that parameter can be obtained through the function by using the posterior distribution of the allele frequencies, instead of assuming a prior distribution for the parameter.

Overdispersion and empirical Bayes:
Until now, models based on a simple random sampling from the gamete pool have been assumed when evaluating allele frequencies. However, as shown in the four case studies treated in this article, a simple random sampling is not necessarily guaranteed. If there are subpopulations divided spatially in a survey area, or a cluster of a genotype is taken, a sample allele frequency might be overdispersed. If overdispersion arises, a sampling variance becomes {sigma}2 times larger than that for a simple random sampling. This can seriously affect the precision of the estimate of genetic distance and the effective population size. As a result, the genetic distance and F-statistics can be overestimated, and the effective population size can be underestimated, if overdispersion is not taken into account in the analysis. Therefore, it is quite important to take overdispersion into account when estimating genetic distance and effective population size.

Sample sizes: If we use the noninformative prior of the Dirichlet distribution (BOX and TIAO 1992 Down), the dispersion parameter might be overestimated for most cases. For example, the dispersion parameter of herring was estimated at 92.5 from the noninformative prior, which was estimated at 2.39 from the empirical Bayes procedure, illustrating the importance of estimating the hyperparameters from the data.

We examined the relationship between sample size and the estimates of the hyperparameters using the data of red sea bream given in Table 1. We estimated the hyperparameters with multipliers of 0.5, 2, 3, and 4 to test each population with the same sample allele frequencies. The estimates of the hyperparameters were stable and not dependent upon sample sizes. This confirmed the robustness of the empirical Bayes procedure (Table 9). However, the dispersion parameter became larger as sample size increased. This is to be expected from the relationship between the total of the hyperparameters, the sample sizes, and the dispersion parameter given by Equation 5.


 
View this table:
In this window
In a new window

 
Table 9. Estimated hyperparameters and the dispersion parameter for four populations of red sea bream for sample sizes of 0.5, 2, 3, and 4 times larger than the original one with the same sample allele frequencies

Suppose there are several subpopulations and the population allele frequencies are largely varied spatially. If the sample sizes are small, one might consider that the large variation in the sample allele frequencies is a function of the small sample size. On the other hand, if the same allele frequencies are obtained for larger sample sizes, one could consider that the large variation comes from the subpopulation structure with confidence. The more samples one draws, the more precisely one can estimate the dispersion parameter. In addition, increasing the number of polymorphic loci to be surveyed may also increase the information available for estimating the dispersion parameter, e.g., the precision of the dispersion parameter estimate in the case of red sea bream, in which the narrowest confidence interval was obtained among our four case studies.

It is also quite important to consider sampling strategies to minimize overdispersion caused by sampling procedures. For example, sampling from different sites and times may be useful to avoid sampling clusters of individuals having the same genotype. Such multiple samples contribute simultaneously to a more precise estimation of the dispersion parameter.

Standardized genetic distance: As I follows the normal distribution, it takes values between -{infty} and +{infty}. For simplicity, let X = and define the expected value by E[]. If > E[], I takes a positive value, and 0 if = E[]. If < E[], I takes a negative value. As X follows the {chi}2 distribution asymptotically when there is no genetic difference, E[] is almost equal to the square root of the degrees of freedom of X, which is the number of loci examined. When {Sigma}Jj=1j does not increase compared to the increased number of loci examined, the likelihood for I taking a negative value increases. On the other hand, when {Sigma}Jj=1j increases to the increased number of loci examined, the likelihood for I taking a positive value increases. This point illustrates the effectiveness of increasing the number of loci to obtain increased information on the genetic differentiation from the value of the posterior mean of I. Conversely, a negative posterior mean indicates that little information on genetic differentiation will be obtained even if the number of loci is increased, as a function of the small genetic differentiation. This is considered to be the cause of the negative values of the posterior mean for I23, I24, and I34.

Overdispersion and gene flow: WEIR 1996 Down stated that for populations that have reached an equilibrium under the joint effects of drift and mutation or migration, WRIGHT 1945 Down found that allele frequencies for loci with two alleles had a ß distribution, and for multiallele loci the distribution was Dirichlet (WRIGHT 1951 Down). We assumed that the hyperparameters for the ß or Dirichlet distributions were common for every sample and locus that was in an overdispersed population. Our assumption corresponds with WRIGHT's (1945, 1951) theories. So if the random sampling is performed, the estimated hyperparameters and dispersion parameter both describe a kind of genetic differentiation between populations that have reached an equilibrium. If all populations mate randomly, the total variance of allele frequency p with two alleles of 2n genes is given by WEIR 1996 Down

(12)

where FST is the coancestry coefficient of WRIGHT 1951 Down. The second term of Equation 12 corresponds to the dispersion parameter, yielding the relationship

(13)

from which we can see larger FST gives larger overdispersion. From Equation 13, we also have the relationship

(14)

RANNALA and HARTIGAN 1996 Down proposed the pseudo-maximum-likelihood method (PMLE) for estimating the rate of gene flow into island populations using the distribution of alleles in samples from a number of islands. We confirmed that their likelihood function for multiple loci (p. 149 Equation 10) coincides with Equation 3 by using the relationship of {Gamma} (n) = (n - 1)!. In PMLE, {alpha}i is treated by {theta}pi. Here, pi is a nuisance parameter estimated from the data as i = , and then i is substituted for pi in the log-likelihood function, and the only unknown parameter {theta} is estimated by using the Newton method. By contrast, we directly maximize the negative log-likelihood function and estimate {Sigma}Jj=1(kj - 1) + 1 parameters by using a simplex minimization. We estimated the hyperparameters and the dispersion parameter from the mtDNA haplotype distribution among islands for Channel Island foxes given in Table 2 of RANNALA and HARTIGAN 1996 Down using the full-likelihood function (Equation 3). The estimates were similar to RANNALA and HARTIGAN's PMLE (Table 10). Thus, our empirical Bayes procedure also offers the maximum-likelihood estimators (MLEs) of the rate of gene flow. MLE is more efficient than PMLE (CHUANG and COX 1985 Down) and has the advantage that it can estimate the confidence interval of the parameters by using the likelihood-ratio test.


 
View this table:
In this window
In a new window

 
Table 10. Estimated hyperparameters and the dispersion parameter from the mtDNA haplotype distribution among islands for Channel Island foxes (Table 2 of RANNALA and HARTIGAN 1996 Down), using the full-likelihood function (Equation 3)

WRIGHT 1969 Down proposed the estimator of {theta} for a discrete-generation island model of a population at equilibrium, based on FST as = (RANNALA and HARTIGAN 1996 Down). Substituting this estimator into Equation 4, we have Equation 13, which was obtained from the total variance of WEIR 1996 Down. As is clear from Equation 6, larger {sigma}2 gives smaller {theta}, indicating that larger genetic differentiation corresponds to smaller gene flow. For the case of the red sea bream, {sigma}2 and {theta} were estimated at 1.80 and 106.55, respectively. For the case of the foxes, they were estimated at 17.89 and 0.45, respectively. From this result, it is clear that the six fox populations in the isolated islands had small gene flow and large genetic differentiation. On the other hand, red sea bream had large gene flow and small genetic differentiation. The estimate of FST for red sea bream was = = 0.0093, which was relatively small. But for foxes it was = = 0.6894, suggesting advanced inbreeding in the fox populations.

The essential idea for estimating overdispersion is to compare the variation of sample allele frequencies obtained from the different locations to the multinomial variance. In addition, the effective population size is based on the changes in allele frequencies between generations. Conversely, overdispersion provides insight into the spatial variation of allele frequencies. By evaluating the spatial variation, it might become possible to discriminate the overdispersion resulting from the variation between generations. Hence, the procedure needs to evaluate overdispersion as a function of the spatial variation and then measure the variation between generations taking overdispersion into account.

In the three case studies we looked at for estimating the effective population size, direct information on the spatial variation was scarce. Therefore, the precision of the dispersion parameter was marginal. When subpopulations exist, overdispersion arises and affects the estimation of the effective population size. It is then important to collect data on the spatial variation. At the same time, when many isolated subpopulations exist, the effective population size is considered to be close to the size of a subpopulation. When this occurs, it seems dangerous to dismiss the variation between generations as overdispersion. It needs further consideration.

Practical considerations on estimating Ne:
From the approximate variance formula of Ne estimate (POLLAK 1983 Down, Equations 28 and 29; WAPLES 1989 Down, Equation 17), it is clear that increasing the sample size, the number of loci, and the number of generations t simultaneously ensures greater precision for the estimate of Ne (WAPLES 1989 Down). MILLER and KAPUSCINSKI 1997 Down stated that if Ne is expected to be moderately large, the sample size, the number of loci, and the number of generations should all be as large as possible. To improve the precision of the estimate of Ne, it is essential to reduce the sampling variance and increase information on genetic drift.

Sample size: The idea of the temporal method is to estimate Ne from the genetic change over time described by F-statistics estimated from the sample allele frequencies. F-statistics, then, consist of the genetic drift and the sampling variance. To evaluate the genetic drift, we have to subtract the sampling variances from the F-statistics. The second and third terms in the denominator of Equation 10 are the sampling variances at generations 0 and t. If Ne is large, the genetic drift may be small, so the denominator of Equation 10 would take a negative value, which leads to an infinite Ne for small sample sizes n0 and nt. If overdispersion arises, the effect of subtracting the sampling variances becomes {sigma}2 times larger, which is why we failed to estimate the upper limit of the credibility region of Ne. As pointed out by WAPLES 1989 Down, the temporal method should be useful for cases of small Ne, where larger genetic drift is expected. Even in the case of a small Ne, the problem of an infinite Ne estimate can occur due to large sampling variance, as shown in the ayu studies, because of the small sample sizes. When one uses the temporal method, reducing the sampling variance is indispensable. The sample size should be kept as large as possible. A larger sample size also provides greater information on the genetic drift.

Number of loci: WILLIAMSON and SLATKIN 1999 Down developed a maximum-likelihood temporal method to estimate Ne and compared estimates with those derived with the F-statistic method. The simulation result in their Table 1 showed that increasing the number of loci reduced the variance and bias in both estimators, although when the number of loci was >50, the corresponding reduction of variance and bias was not large, and the total information on allele frequency changes did not increase much. The results of WILLIAMSON and SLATKIN 1999 Down suggest that information on genetic drift was not improved much even if the number of loci was >100, because their simulation was based on diallelic alleles. F-statistics measure a magnitude of changes in allele frequencies per allele, which can be regarded as a sample mean. So, the estimation precision can be improved if the number of loci is increased. This suggests that increasing the number of alleles is essential, which can be attained by increasing the number of loci.

Number of years between samples: The number of years between samples is correlated with the number of generations, and it then affects the precision of the estimate of Ne. A large number of generations between samples can improve the precision of the estimate of Ne (WAPLES 1991 Down), because information on genetic drift increases as the number of generations increases. WILLIAMSON and SLATKIN 1999 Down(Table 1) showed through simulated populations sampled at generations 0–4 and 0–8 that the variance and bias in both estimators were reduced when the number of years between samples was doubled, although the effect of reducing the bias was not clearly observed with the F-statistic method.

For the case study of ayu, the posterior mean of Ne was 350 based on F1 and 796 based on F3, and SDs for the two estimates were 4024 and 18,538, respectively, showing the reduction of precision despite the fact that the number of years between samples was doubled (Table 8). This is because the doubled number of generations increased the variance of allele frequency changes. The numbers of infinite Ne estimates in 10,000 simulations were 8453 on F1 and 4927 on F3, and the smaller F values increased the estimated value of Ne. The result was similar for northern pike. The point estimate of Ne based on F3 (= 125) was larger than those based on F1 (= 35) and F2 (= 72), and the confidence interval for F1 was the largest (MILLER and KAPUSCINSKI 1997 Down).

MILLER and KAPUSCINSKI 1997 Down discussed the question of which estimate obtained from F3 and Fmean to use for the entire time interval. If Ne changes between the two sampling intervals, it should be evaluated by using F1 and F2 for the respective intervals. The numbers of adult ayu used for the artificial fertilization in 1997 were 600 females and 480 males, which are lower numbers than those used in 1996. The posterior mean of F2 was larger than that of F1, which resulted in a smaller Ne estimate based on F2, supporting the fact that Ne actually changed (Table 8 and Fig 6). One can use the estimate of Ne based on Fmean as the harmonic mean of the effective population size in the respective intervals. The precision was improved by using Fmean, in which a larger quantity of information was included. The greater precision that occurred with using Fmean and the lower precision that occurred with using F3 for the case study of ayu are shown clearly in Fig 6.

When Ne does not change in the entire interval, F3 is expected to have more information on genetic drift than F1 and F2. WILLIAMSON and SLATKIN 1999 Down(Table 1) also showed by their simulation that the estimates of Ne based on Fmean had smaller variances and biases than those based on F1 and F3. This suggests that the decision about which estimate obtained from F3 and Fmean to use should be made on the basis of the relative effect of improving precision by using Fmean and a doubled number of years. Which estimate has more information on the genetic drift must be determined on a case-by-case basis.

Overlapping generations: The basic theory of the temporal method assumes generations to be discrete. The expected number of generations used in Equation 10 directly affects the estimate of Ne. We take time to be measured in years. The expected number of generations between samples can be estimated by dividing the number of years between samples by the mean generation time, which corresponds to the mean age of maturity. In the case of ayu, since the life span is 1 year, 1 year coincides with one generation, which makes it possible to estimate E[t] by the above-mentioned method.

In the case of herring and salmon, however, where there are overlapping year classes of spawners, the estimation of E[t] is complex. When generations overlap, the age-specific birth rate may essentially affect the estimate of E[t]. HILL 1979 Down showed that estimates of Ne are robust with overlapping generations if demographic parameters are stable. If demographic parameters change over time, may be biased upward, leading to an estimate of Ne that is too small (POLLAK 1983 Down; WAPLES 1989 Down). JORDE and RYMAN 1995 Down proposed a correction method for the bias and showed by using simulations for short time intervals that the bias was larger for a case where age-specific birth rates were extremely different compared with a case with equal age-specific birth rates. WAPLES 1990 Down developed a statistical method for this situation that can be applied to Pacific salmon populations that have an unusual life history of semelparity with overlapping year classes. TAJIMA 1992 Down developed a formula to estimate the expected number of generations without computer simulations and showed the estimates agreed well with estimates obtained by the method of WAPLES 1990 Down, which requires computer simulations.

In the case of herring, age-specific survival and birth rates were unknown, so it was not possible to apply the method of JORDE and RYMAN 1995 Down, which requires these demographic parameters. If an individual continues to spawn every year after the first spawning, like herring, E[t] may be estimated by dividing the years between samples by the mean age of spawners, leading to the estimate of E[t] at 1.48 (= ). If a distribution of age-specific birth rates is concentrated to a specific age, E[t] may be close to the estimate obtained by the methods of WAPLES 1990 Down or TAJIMA 1992 Down. We estimated the number of steps at 12 and the expected number of generations at 5.71 (= ) for a time interval of 3 yr between samples by using the computer program given in TAJIMA 1992 Down, where we substituted f(2) = 0.9, f(3) = 0.1, and f(4) = 0. The downward bias of e when demographic parameters change over time with overlapping generations should be corrected upward. From a conservation viewpoint, the estimate of Ne without the correction must be conservative for the overlapping generations.

Rate of inbreeding:
As another evaluation of breeding population size, the inbreeding coefficient may be useful, especially for cases where the population size is estimable, as it is in the field of fishery science. CROW 1954 Down used the inbreeding effective size, making a distinction between that and the variance effective size, which was defined by an inverse of the inbreeding coefficient. However, it is known that there is no great difference between the two effective sizes (NEI 1987 Down); hence we calculated the posterior distribution of the rate of inbreeding defined as 1/(2Ne) (FALCONER and MACKAY 1996 Down), as an infinite Ne corresponds to an inbreeding coefficient of 0, obtained from the 95% credibility region. The means, SDs, and 95% credibility regions of the posterior distributions of the rate of inbreeding were 0.0083, 0.0070, and [0, 0.0251] for herring; 0.0004, 0.0012, and [0, 0.0041] for northern pike; and 0.0011, 0.0041, and [0, 0.0140] for ayu.

The posterior mean of herring was 7.5 times larger than that of ayu with a right-tailed credibility region shown in Fig 7. Overdispersion for herring was underestimated, because the samples for males and females taken in 1996 were analyzed as independent samples even though they were the same sample, leading to a smaller value of Ne, which caused the rate of inbreeding to be overestimated. The mean for northern pike was smallest with the narrowest credibility region. However, these values may be underestimated because of an overestimated dispersion parameter of northern pike, which was the largest among our four case studies. There was only one sample for one sampling year, and we assumed that the seven loci had common hyperparameters, so the estimated dispersion parameter may include the change of the allele frequencies.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 7. Posterior distributions of the rate of inbreeding of herring, ayu broodstock, and northern pike for the 95% lower limit of the dispersion parameter.

Multistage sampling in hatcheries:
All existing methods assume that Ne is drawn from a gamete pool by a simple random sampling. This is an appropriate assumption for the reproduction of a wild population. However, for broodstocks cultured over generations in hatcheries, candidates of the next broodstock are sampled from the progenies produced by the broodstock. Therefore, Ne is drawn from the progenies by a two-stage sampling. If artificial fertilization using a part of the candidates is performed, as in the case study of ayu, Ne is drawn from the progenies by a three-stage sampling and the sample is drawn from the candidates to estimate the allele frequencies, which is therefore a two-stage sampling of the progenies. The multistage sampling must lead to the different form of V(x - y) given in WAPLES 1989 Down. This is a problem that needs further research, but it should be noted that the variances corresponding to the two-stage and three-stage sampling become small when the sample sizes are large. In the case of ayu, a total of 3000–4000 candidates were sampled from the progenies and cultured in rearing tanks, and 1500 adult fish from the candidates were used for artificial fertilization. Hence the sample allele frequencies of ayu were expected to represent those of the progenies produced by the broodstock. However, if the sample sizes are small, V(x - y) is seriously affected.


*  ACKNOWLEDGMENTS

We thank Zhao-Bang Zeng and two anonymous referees for their comments on an earlier version of this article. We also thank Ray Timm for critical review of the manuscript, Fumio Tajima for important suggestions made during our research, Kazutomo Yoshizawa for biological information on ayu broodstocks including unpublished data, and Masashi Yokota for helpful discussions.

Manuscript received February 17, 2000; Accepted for publication July 26, 2000.


*  APPENDIX
*TOP
*ABSTRACT
*METHODS
*CASE STUDIES
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Expectation of when X follows a {chi}2 distribution:
Let X be a random variable that follows a {chi}2 distribution with degree of freedom of k. We derive here the expectation of . Generally, when X is continuous and has a probability density function f(x), the expectation of g(X) is given by

For our case, the expectation of is calculated as

Let = y, and we have

Using the gamma function, which is given by

finally we have


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODS
*CASE STUDIES
*DISCUSSION
*APPENDIX
*LITERATURE CITED

ANDO, T., and N. OHKUBO, 1997 A study for estimating effective population size from allozyme allele frequencies, Heisei 8 nendo seitaikei hozengata syubyo seisan gijyutu kaihatsu kenkyu seika no gaiyo, pp. 74–79. Fishery Agency of Japan, Tokyo (in Japanese).

BARTLEY, D. M., 1999 Marine ranching, a global perspective, pp. 79–90 in Stock Enhancement and Sea Ranching, edited by B. R. HOWELL, E. MOKSNESS and T. SVÅSAND. Blackwell, Oxford.

BARTLEY, D. M., D. B. KENT, and M. A. DRAWBRIDGE, 1995  Conservation of genetic diversity in a white seabass hatchery enhancement program in southern California. Am. Fish. Soc. Symp. 15:249-258.

BLANKENSHIP, H. L. and K. M. LEBER, 1995  A responsible approach to marine stock enhancement. Am. Fish. Soc. Symp. 15:165-175.

BOX, G. E. P., and G. C. TIAO, 1992 Bayesian Inference in Statistical Analysis. Wiley, New York.

CAMPTON, D. E., 1995  Genetic effects of hatchery fish on wild populations of Pacific salmon and steel head: what do we really know? Am. Fish. Soc. Symp. 15:337-353.

CHAMBERS, J. M., and T. J. HASTIE (Editors), 1992 Statistical Models in S. Wadsworth and Brooks, Belmont, CA.

CHUANG, C. and C. COX, 1985  Pseudo maximum likelihood estimation for the Dirichlet-multinomial distribution. Comm. Stat. Theory Methodol. 14:2293-2311.

COHEN, J., 1994  The earth is round (p < 0.05). Am. Psychol. 49:997-1003.

CROW, J. F., 1954 Breeding structure of populations. II. Effective population number, pp. 543–556 in Statistics and Mathematics in Biology, edited by T. A. BANCROFT, J. W. GOWEN and J. L. LUSH. Iowa State College Press, Ames, IA.

DIZON, A. E., B. L. TAYLOR, and G. M. O'CORRY-CROWE, 1995  Why statistical power is necessary to link analyses of molecular variation to decisions about population structure. Am. Fish. Soc. Symp. 17:288-294.

FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman, Essex, UK.

HAGEN, R. L., 1997  In praise of the null hypothesis statistical test. Am. Psychol. 52:15-24.

HARADA, Y., M. YOKOTA, and M. IIZUKA, 1998  Genetic risk of domestication in artificial stocking and its possible reduction. Res. Popul. Ecol. 40:311-324.

HILBORN, R., 1992  Hatcheries and the future of salmon in the Northwest. Fisheries 17:5-8.

HILBORN, R. and J. WINTON, 1993  Learning to enhance salmon production: lessons from the salmonid enhancement program. Can. J. Fish. Aquat. Sci. 50:2043-2056.

HILL, W. G., 1979  A note on effective population size with overlapping generations. Genetics 92:317-322[Abstract/Free Full Text].

HOTTA, T., T. MATSUISHI, H. SAKANO, and Y. KANNO, 1999  Population structure of Pacific herring Clupea pallasii in the Eastern Hokkaido waters. Nippon Suisan Gakkaishi. 64:655-660. (in Japanese)..

JOHNSON, N. L., and S. KOTZ, 1969 Discrete Distributions. Wiley, New York.

JORDE, P. E. and N. RYMAN, 1995  Temporal allele frequency change and estimation of effective size in populations with overlapping generations. Genetics 139:1077-1090[Abstract].

KAERIYAMA, M., 1999 Hatchery programmes and stock management of salmonid populations in Japan, pp. 153–167 in Stock Enhancement and Sea Ranching, edited by B. R. HOWELL, E. MOKSNESS and T. SVASAND. Blackwell, Oxford.

KITADA, S., 1999 Effectiveness of Japan's stock enhancement programs: current perspectives, pp. 103–131 in Stock Enhancement and Sea Ranching, edited by B. R. HOWELL, E. MOKSNESS and T. SVÅSAND. Blackwell, Oxford.

KITADA, S., K. HIRAMATSU, and H. KISHINO, 1994  Estimating mortality rates from tag recoveries: incorporating over-dispersion, correlation and change points. ICES J. Mar. Sci. 51:241-251.

KNAPP, G., 1999 Alaska salmon ranching: an economic review of the Alaska salmon hatchery programme, pp. 537–556 in Stock Enhancement and Sea Ranching, edited by B. R. HOWELL, E. MOKSNESS and T. SVÅSAND. Blackwell, Oxford.

KRIMBAS, C. B. and S. TSAKAS, 1971  The genetics of Dacus oleae. V. Changes of esterase polymorphism in a natural population following insecticide control: Selection of drift? Evolution 25:454-460.

LANGE, K., 1995  Application of the Dirichlet distribution to forensic match probabilities. Genetica 96:107-117[Medline].

LEE, P. M., 1989 Bayesian Statistics: An Introduction. Edward Arnold, London.

MARITZ, J. S., and T. L. LWIN, 1989 Empirical Bayes Methods, Ed. 2. Chapman & Hall, London.

MCCULLAGH, P., and J. A. NELDER, 1983 Generalized Linear Model, Ed. 2. Chapman & Hall, London.

MILLER, L. M. and A. R. KAPUSCINSKI, 1997  Historical analysis of genetic variation reveals low effective population size in a Northern Pike (Esox lucius) population. Genetics 147:1249-1258[Abstract].

NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.

NEI, M. and F. TAJIMA, 1981  Genetic drift and estimation of effective population size. Genetics 98:625-640[Abstract/Free Full Text].

PETERMAN, R. M., 1990  Statistical power analysis can improve fisheries research and management. Can. J. Fish. Aquat. Sci. 47:2-15.

POLLAK, E., 1983  A new method for estimating the effective population size from allele frequency changes. Genetics 104:531-548[Abstract/Free Full Text].

RANNALA, B. and J. A. HARTIGAN, 1996  Estimating gene flow in island populations. Genet. Res. 67:147-158[Medline].

RITTER, J. A., 1997  The contribution of Atlantic salmon Salmo salar L. enhancement to a sustainable resource. ICES J. Mar. Sci. 54:1177-1187.

ROFF, D. A. and P. BENTZEN, 1989  The statistical analysis of mitochondrial DNA polymorphisms: {chi}2 and the problem of small sample. Mol. Biol. Evol. 6:539-545[Abstract].

SANGHVI, L. D., 1953  Comparison of genetical and morphological methods for a study of biological differences. Am. J. Phys. Anthropol. 11:385-404[Medline].

SNEDECOR, G. W., and W. G. COCHRAN, 1967 Statistical Methods, Ed. 6. Iowa State University Press, Ames, IA.

TABATA, K. and A. MIZUTA, 1997  RFLP analysis of the mtDNA D-loop region in red sea bream Pagrus major population from four locations of western Japan. Fish. Sci. 63:211-217.

TAJIMA, F., 1992  Statistical method for estimating the effective population size in Pacific salmon. J. Hered. 83:309-311[Free Full Text].

UTTER, F., 1998  Genetic problems of hatchery-reared progeny released into the wild, and how to deal with them. Bull. Mar. Sci. 62(2):623-640.

WALTERS, C. J., 1988 Mixed-stock fisheries and the sustainability of enhancement production for chinook and coho salmon, pp. 109–115 in Salmon Production, Management, and Allocation, edited by W. J. MCNEIL. Oregon State University Press, Corvallis, OR.

WAPLES, R. S., 1989  A general approach for estimating effective population size from temporal changes in allele frequencies. Genetics 121:379-391[Abstract/Free Full Text].

WAPLES, R. S., 1990  Conservation genetics of Pacific salmon: III. Estimating effective population size. J. Hered. 81:277-289[Abstract/Free Full Text].

WAPLES, R. S., 1991  Genetics interactions between hatchery and wild salmonids: lessons from the Pacific Northwest. Can. J. Fish. Aquat. Sci. 48(Suppl. 1):124-133.

WAPLES, R. S., 1999  Dispelling some myths about hatcheries. Fisheries 24(2):12-21.

WEIR, B. S., 1996 Genetic data analysis II. Sinauer Associates, Sunderland, MA.

WILLIAMSON, E. G. and M. SLATKIN, 1999  Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152:755-761[Abstract/Free Full Text].

WRIGHT, S., 1931  Evolution in Mendelian populations. Genetics 16:97-159[Free Full Text].

WRIGHT, S., 1945  The differential equation of the distribution of gene frequencies. Proc. Natl. Acad. Sci. USA 31:383-389.

WRIGHT, S., 1951  The general structure of populations. Ann. Eugen. 15:323-354.

WRIGHT, S., 1969 Evolution and Genetics of Populations: The Theory of Gene Frequencies, Vol. 2. University of Chicago Press, Chicago.

YOSHIZAWA, K., 1997 Inbreeding of cultured ayu from allozyme allele frequencies, Heisei 8 nendo seitaikei hozengata syubyo seisan gijyutu kaihatsu kenkyu seika no gaiyo, pp. 30–38. Fishery Agency of Japan, Tokyo (in Japanese).




This article has been cited by other articles:


Home page
GeneticsHome page
S. Kitada, T. Kitakado, and H. Kishino
Empirical Bayes Inference of Pairwise FST and Its Distribution in the Genome
Genetics, October 1, 2007; 177(2): 861 - 873.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
T. Kitakado, S. Kitada, H. Kishino, and H. J. Skaug
An Integrated-Likelihood Method for Estimating Genetic Differentiation Between Populations
Genetics, August 1, 2006; 173(4): 2073 - 2082.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Kitada and H. Kishino
Simultaneous Detection of Linkage Disequilibrium and Genetic Differentiation of Subdivided Populations
Genetics, August 1, 2004; 167(4): 2003 - 2013.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
G. Laval, M. SanCristobal, and C. Chevalet
Maximum-Likelihood and Markov Chain Monte Carlo Approaches to Estimate Inbreeding and Effective Size From Allele Frequency Changes
Genetics, July 1, 2003; 164(3): 1189 - 1204.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Bayesian Analysis of Genetic Differentiation Between Populations
Genetics, January 1, 2003; 163(1): 367 - 374.