Understanding the population genetic factors that shape genome variability is pivotal to the design and interpretation of studies using large-scale polymorphism data. We analyzed patterns of polymorphism and divergence at Z-linked and autosomal loci in the domestic chicken (Gallus gallus) to study the influence of mutation, effective population size, selection, and demography on levels of genetic diversity. A total of 14 autosomal introns (8316 bp) and 13 Z-linked introns (6856 bp) were sequenced in 50 chicken chromosomes from 10 highly divergent breeds. Genetic variation was significantly lower at Z-linked than at autosomal loci, with one segregating site every 39 bp at autosomal loci (θW = 5.8 ± 0.8 × 10–3) and one every 156 bp on the Z chromosome (θW = 1.4 ± 0.4 × 10–3). This difference may in part be due to a low male effective population size arising from skewed reproductive success among males, evident both in the wild ancestor—the red jungle fowl—and in poultry breeding. However, this effect cannot entirely explain the observed three- to fourfold reduction in Z chromosome diversity. Selection, in particular selective sweeps, may therefore have had an impact on reducing variation on the Z chromosome, a hypothesis supported by the observation of heterogeneity in diversity levels among loci on the Z chromosome and the lower recombination rate on Z than on autosomes. Selection on sex-linked genes may be particularly important in organisms with female heterogamety since the heritability of sex-linked sexually antagonistic alleles advantageous to males is improved when fathers pass a Z chromosome to their sons.
SINGLE-NUCLEOTIDE polymorphisms (SNPs) constitute a primary source of the variability that underlies differences in the genetic make-up of individuals within species. Currently, the application of SNPs to genomics and population genetics is rapidly expanding and the increasing numbers of species in which an extensive number of SNPs have been identified include humans (International SNP Map Working Group 2001), mouse (Wadeet al. 2002), Arabidopsis thaliana (Choet al. 1999), Caenorhabditis elegans (Kochet al. 2000), maize (Tenaillonet al. 2001), and soybean (Zhuet al. 2003). Population genetics theory stipulates that the degree and character of genetic variability are governed by factors such as mutation, selection, and effective population size. Understanding the role of these factors in shaping patterns of polymorphism within genomes is key to the interpretation of SNP data in, for example, association studies and for the quantification of linkage disequilibrium (LD) through analysis of haplotype diversity. It is also required for tracing the evolutionary history of genomes and species, on the basis of polymorphism data.
Within-genome variation in SNP frequency is likely to occur at several different scales. To name but one reason, hot-spot sites for mutation as well as mutation rate variation at larger scales imply heterogeneity in the distribution of SNPs (Ebersbergeret al. 2002; International Human Genome Sequencing Consortium 2002; Smithet al. 2002,Smithet al. 2002; Hardisonet al. 2003). Empirical data from the human (Mullikinet al. 2000; International SNP Map Working Group 2001; Reichet al. 2002) and mouse (Wadeet al. 2002) genomes provide evidence of significant variation in the density of SNPs along chromosomes. There is also the potential of variation in SNP frequency between chromosomes and this applies in particular to differences between autosomes and sex chromosomes (Begun and Whitley 2000; International SNP Map Working Group 2001). These chromosome classes differ with respect to several of the factors thought to affect nucleotide diversity and contrasting the levels of diversity in the different classes may thus be informative for addressing the role of population genetic factors in shaping genomic variability.
Preliminary surveys of the incidence of SNPs in the genome of the domestic chicken have indicated genetic variability to be at relatively high levels. No detailed estimates of nucleotide diversity are available but there are reports of one SNP every 60–500 bp (Schmid et al. 2000, 2001; Parsanejadet al. 2002; Smithet al. 2002,Smithet al. 2002; Liuet al. 2003), the broad range of estimated frequencies probably reflecting that only a limited number of genes have been analyzed, that coding as well as untranslated and intron regions have been surveyed, and that a varying number of breeds/lines have been sampled. It thus appears that despite strong artificial selection during domestication, average genome variability is higher in chicken than in, for instance, humans (International SNP Map Working Group 2001). In contrast to mammals, birds have female heterogamety (males ZZ; females ZW) and the avian sex chromosomes have evolved independently from mammalian X and Y (Fridolfssonet al. 1998; Nandaet al. 2002); i.e., Z and X are not syntenic. Moreover, due to female heterogamety the population genetics of avian and mammalian sex chromosomes differ in several respects. In this study we survey genetic variability in noncoding regions of the chicken genome, including autosomes as well as the Z chromosome, using a panel of highly divergent breeds. We find nucleotide diversity to be strikingly higher in autosomes than in the Z chromosome, an observation that suggests low male effective population size and a role of selection in reducing levels of genetic variability on the Z chromosome.
MATERIALS AND METHODS
DNA samples: We used genomic DNA from 25 male chickens (Gallus gallus), giving 50 chromosomes of autosomal as well as Z sequences. The sample included two to three individuals from each of 10 diverse breeds, which was a subset of the 52 breeds used by Hillel et al. (2003). This subset of breeds has been identified as constituting a broad representation of the total gene pool of domestic chicken (Rosenberget al. 2001). Two individuals were included from broiler sire line B, brown-egg-layer line D, broiler dam line D, Icelandic landrace, and captive red jungle fowl G. g. gallus. Three individuals were analyzed from Fayoumi, Marans, Transylvanian naked neck, Green-legged partridge, and White-egg-layer line A. For all introns analyzed we also determined the orthologous sequence in turkey, to be able to contrast levels of intraspecific diversity and interspecific divergence. Chicken and turkey have an estimated divergence time of 28 million years ago (Dimcheffet al. 2002).
PCR: We amplified 27 introns from 13 autosomal and 9 Z-linked genes (Table 1). While most introns were >400 bp, they were analyzed in overlapping ∼250-bp amplicons. PCR was performed in 25-μl reactions containing 20 ng DNA, 1× PCR Gold buffer (Applied Biosystems, Foster City, CA), 0.2 μm of each primer, 2.0–2.5 mm MgCl2, 2.0 μm dNTPs, and 1 unit AmpliTaq Gold (Applied Biosystems). Amplification was performed using an initial denaturation of 95° for 5 min, followed by 33–40 cycles at 94° for 30 sec, a touchdown annealing temperature profile for 30 sec, and extension at 72° for 45 sec. Primer sequences, annealing temperatures, and MgCl2 concentrations for each marker are available as supplementary information at http://www.genetics.org/supplemental/.
Sequencing and SNP detection: The fragments were purified using ExoSAP-IT (Amersham Biosciences, Arlington Heights, IL) and direct sequenced using original PCR primers and the DYEnamic cycle sequencing kit (Amersham Biosciences) and analyzed on a MegaBACE 1000 (Amersham Biosciences) instrument. Sequences from both directions were aligned and edited in the program AutoAssembler (Applied Biosystems), which also was used for defining heterozygous positions. Singletons were confirmed by independent sequencing reactions using new amplification products. In cases where a fragment was found to be heterozygous for an insertion or deletion polymorphism, the fragment was resequenced from both directions using a new amplification product.
Complete diploid intron sequences were assembled from the overlapping fragments using Sequencher (Gene Codes, Ann Arbor, MI). Since phase was unknown, each sequence was then separated into two “pseudo-haplotypes” on the basis of the ambiguity codes produced by Sequencher using a Perl script. All chicken and turkey sequences at each locus were then aligned using a ClustalW algorithm (Thompsonet al. 1994) in Sequence Navigator (Applied Biosystems) and improved by manual adjustment. A number of additional Perl programs were used during the analysis to extract polymorphism information from the sequences and to format them for use in data analysis programs. There were two segregating sites with three alleles, which indicates a frequency of multiallelic sites similar to that found in the human genome (Patilet al. 2001).
Statistical analysis: Haplotypes were inferred at each locus, using the expectation-maximization (EM) algorithm implemented in Arlequin (Schneideret al. 2000); the inferred haplotypes are provided as supplementary information at http://www.genetics.org/supplemental/. We then used DnaSP (Rozas and Rozas 1999) to analyze patterns of polymorphism and divergence and perform tests of neutrality on the basis of the allele frequency spectrum. These included Tajima's D (Tajima 1989a) and Fu and Li's D and F (Fu and Li 1993) test statistics. These tests measure whether the observed frequencies of segregating mutations are compatible with the frequencies expected under a standard neutral model. Positive selection or the presence of weakly deleterious mutations (as well as population growth) tends to give an excess of low-frequency variants, resulting in negative test values. Balancing selection (or population contraction) may cause an excess of intermediate-frequency variants and positive test values. Multilocus Hudson-Kreitman-Aguadé (HKA) tests (Hudsonet al. 1987) were performed using the HKA program written by Jody Hey and available at http://lifesci.rutgers.edu/~heylab/index.html. This program allows tests assuming sex ratios different from 1:1. The basic principle of the HKA test is to test whether the levels of intraspecific diversity and interspecific divergence are positively correlated in two or more genomic regions, as predicted by the neutral mutation hypothesis. If they are not, selection may be interpreted as having affected polymorphism levels in at least one of the regions analyzed. AMOVA analysis of population subdivision was performed in Arlequin. Confidence intervals for the ratio of average levels of variation on autosomes and Z were determined by nonparametric bootstrapping: To account for potential mutation rate variation between loci, segregating sites from each chromosomal class were resampled with replacement first by site and then by locus, using 10,000 replicates.
We amplified and sequenced 27 different introns, from 22 different genes spread over the genome, in 25 male chickens from 10 diverse breeds to get a picture of the patterns of genetic variability in the chicken genome (Table 1). With this experimental design we sought to account for regional variation in the mutation rate and for local effects of selection. In mammals, regions of local similarity in mutation rate have been observed at scales >1 Mb (Lercheret al. 2001). This implies that to obtain a representative genomic sample, choosing many segments of relatively short length should be preferable to the use of a limited number of longer segments (Ellegrenet al. 2003). As the aim of the study was to get a broad idea of levels of polymorphism in the chicken genome and not to characterize the levels in particular breeds, we chose to analyze a few individuals from many different breeds rather than many individuals from one or only a few breeds. The sample also included two males of the red jungle fowl, the wild ancestor to domestic chicken. The total length of the sequence surveyed was ∼15.2 kb, divided into 8316 bp from autosomes (14 introns) and 6856 bp from the Z chromosome (13 introns).
The total numbers of segregating sites were 214 on the autosomes (one every 39 bp) and 44 on the Z chromosome (one every 156 bp). For autosomal introns, average nucleotide diversity (π) calculated from the average number of pairwise differences was 6.5 ± 0.3 × 10–3 and Watterson's estimate of θ per site (θW; basically, the proportion of segregating sites in a sample) was 5.8 ± 0.8 × 10–3. For Z chromosome introns, π was 2.0 ± 0.1 × 10–3 and θW was 1.4 ± 0.4 × 10–3. These observations suggest distinct differences in levels of genetic variability on autosomes and the Z chromosome of chicken; the A/Z ratio for π is 3.2 and for θW it is 4.1. Sixteen insertion or deletion polymorphisms (indels) were identified (excluding regions with length polymorphism in tandem repetitive DNA sequences); 13 indels were in autosomal and 3 were in Z-linked introns. For this type of polymorphism too, autosomal diversity thus seemed to exceed that of the Z chromosome. Details on polymorphism data are presented in Table 1.
We also estimated π for individual breeds. Although only a limited number of chromosomes per breed were sampled, it was evident that the trend of higher autosomal variability was present within breeds as well. The within-breed A/Z ratio for π was in the range of 2.4–11.1, with a mean of 5.4 ± 2.7. A high A/Z ratio was also seen in the few red jungle fowl analyzed (3.11).
It should be noted that variation in the underlying rate of mutation is highly unlikely to account for reduced Z chromosome variation. Differences in the germline mutation rate between autosomes and sex chromosomes generally follow from the male mutation bias (αm), leading to higher mutation rates the more time a chromosome class spends in the male germline (Miyataet al. 1987; Hurst and Ellegren 1998; Liet al. 2002). In birds, where αm is estimated at 2–4 (Ellegren and Fridolfsson 1997; Kahn and Quinn 1999; Carmichaelet al. 2000; Bartosch-Härlidet al. 2003), Z is two-thirds of the time in the male germline and should therefore be expected to mutate at a higher rate than autosomes, an assumption supported by sequence data (Axelssonet al. 2004). If anything, the effect of this difference would be to give high levels of Z chromosome diversity, which is opposite to what we find in this study.
An HKA test was performed to test for evolutionary heterogeneity between intraspecific variation in chicken and interspecific divergence with the turkey outgroup, using all autosomal loci combined and all Z-linked loci combined (Table 2). This test took into account the mode of inheritance of the Z chromosome, which, assuming random mating, reduces its effective population size to three-quarters that of autosomal loci. Statistically significant deviations from the neutral model were observed (P = 0.006), suggesting that the observed ratio of average variation at autosomal and Z-linked loci is incompatible with a neutral model.
Tajima's D (Tajima 1989a) and Fu and Li's D and F (Fu and Li 1993) statistics based on allele frequency spectra at all loci (Table 1) broadly indicate that a neutral model cannot be rejected. Only two loci (RPL7a intron 3 and ALDOB intron 7) have significant values for one or more of the test statistics, but this is not unexpected when multiple tests are performed, particularly when short sequences are used as is the case here. However, the average values of all test statistics, including Tajima's D, for both Z-linked and autosomal loci are positive, although not significant, indicating a weak excess of common variants.
Two further HKA tests were performed to examine heterogeneity (a) between all autosomal loci and (b) between all Z-linked loci (Table 2). There is no evidence for heterogeneity between autosomal loci (P = 0.527) but Z chromosome loci exhibited significant deviations from the neutral model (P = 0.024). This suggests that, in addition to a reduced average variation on the Z chromosome compared to autosomal loci, there is evidence for heterogeneity within the Z chromosome.
There are two immediate conclusions of this study: (a) that Z-linked loci of domestic chicken have significantly reduced levels of variation in noncoding regions compared with autosomal loci and (b) that there is significant heterogeneity in levels of variation between loci on the Z chromosome. In the following we consider these observations in relation to effective population size, selection, and demography. However, it should be noted directly that the evolutionary history of chickens is complex, involving natural processes in populations of the wild ancestor as well as artificial selection and changes in population size and structure during domestication and breeding. Because of this, patterns of genetic variability in the chicken genome are likely to have a complex background, probably affected by processes prior to, during, and subsequent to domestication. Disentangling the relative importance of these processes may be difficult.
A low male effective population size is compatible with reduced Z chromosome variability: Under random mating, the effective population size of Z is three-quarters that of autosomes and polymorphism levels should thus be expected to scale accordingly. However, an HKA test demonstrates that our observation of three to four times higher average nucleotide variation at autosomal than at Z-linked loci is significantly greater than expected under neutrality (P = 0.006). A potential cause of this departure from the neutral model is that the assumptions of random mating are violated in chickens as domestication is likely to have been associated with skewed reproductive success among cocks, as is currently practiced in poultry breeding (Muir and Aggrey 2003). Moreover, in the wild ancestor of the domestic chicken, the red jungle fowl, there is a high variance in male reproductive success with a suggested twofold difference in the effective population size of males and females (Zuk et al. 1990a,b; Collias and Collias 1996).
A low male effective population size will lead to a lowered effective population size of Z. If we conservatively assume that mutation rates at Z-linked and autosomal loci are the same, then the ratio of the number of autosomes to the number of Z chromosomes approaches 2 as the bias in the sex ratio of successfully reproducing birds becomes severe. This means that a twofold excess of polymorphism on autosomes compared to Z is the maximum possible difference one should expect from nonrandom mating. When twice as many females as males are contributing to the gene pool, as suggested by red jungle fowl data, the effective population size of autosomes is 1.5 times that of Z; when there is 1 male for every 10 females it is 1.8 (22/12). However, even when an extreme operational sex ratio of 1:10 is incorporated in the HKA test, Z chromosome variability is still less than expected under neutrality (P = 0.04). We conclude that a reduced male effective population size is likely to have led to reduced levels of Z chromosome diversity in chicken, although it fails to account for all of the difference in polymorphism levels between autosomes and the Z.
Selection: We next consider the possibility of selection contributing to the observed discrepancy in polymorphism levels between autosomes and Z. As it would seem improbable that selection has independently increased variation at several unlinked autosomal loci (for example, due to balancing selection), it can be hypothesized that selection has decreased variation on the Z chromosome. The influence of selection on the Z chromosome is supported by an HKA test showing significant heterogeneity in polymorphism levels between loci on the Z chromosome (P = 0.024), which cannot be easily explained by differences in effective population size between chromosomal classes. Two possible factors could cause selection to have a greater effect on variation on the Z chromosome: (i) lower recombination rate on the Z chromosome resulting in a greater effect of selection on linked neutral variants and (ii) greater incidence of selection at Z-linked loci. We now examine the evidence for these two possibilities.
Positive selection and background selection both cause a reduction of genetic variation at linked neutral sites. During positive selection, the sweep through the population of an adaptive mutation will drag with it alleles at linked neutral sites. Fixation of the adaptive mutation may thereby lead to the associated fixation of linked variants, often referred to as “genetic hitchhiking” (Maynard-Smith and Haigh 1974; Kaplanet al. 1989). For background (negative, purifying) selection, the removal from the population of chromosomes containing deleterious alleles will also lead to removal of linked variants (Charlesworth et al. 1993, 1995). A common prediction from both models is that the strength of the effect of selection on polymorphism levels at linked sites will be inversely proportional to the recombination rate. In other words, when there is more recombination, the association between a locus under selection and linked alleles will be uncoupled more quickly.
Data from chicken genome mapping provide evidence for significant heterogeneity in the sex-averaged recombination rate between autosomes and the Z chromosome (Levinet al. 1993; Smith and Burt 1998; Groenenet al. 2000; Schmidet al. 2000): microchromosomes, ∼5 cM/Mb; macrochromsomes, ∼2.5 cM/Mb; Z chromosome, ∼1.3 cM/Mb. The recombination rate on Z is thus ∼2.5 times less than the average autosomal rate. This suggests that the effects of selection on linked neutral sites would stretch much farther on average from a locus under selection on the Z chromosome than from an autosomal locus. As a consequence, lower levels of variability on the Z chromosome than on autosomes are predicted.
If selection is an important factor to explain the contrasting levels of nucleotide diversity on chicken autosomes and the Z chromosome, is it through the action of selective sweeps or background selection? Our data are compatible with a scenario of selective sweeps as positive selection can be expected to be more effective in reducing neutral polymorphism on Z than on autosomes (Begun and Whitley 2000). In contrast, as background selection should be expected to quickly eliminate recessive deleterious mutations when exposed on the hemizygote Z chromosome, Z should have a larger proportion of mutation-free chromosomes than autosomes and, as a consequence, greater neutral polymorphism (Crow and Kimura 1970; Aquadroet al. 1994). However, the predictions made by the models assume equal rates of recombination in the two chromosome classes, which we know is not the case. Moreover, while the observed heterogeneity in polymorphism levels on Z would seem to suggest the action of selective sweeps, background selection can also cause heterogeneity in polymorphism levels when there is variation in the local rate of recombination (Charlesworth et al. 1993, 1995).
It is also possible that the strength and frequency of selection differ between autosomes and the Z chromosomes. Important in this context are the observations of disproportionate sex linkage of traits involved in sexual selection (Reinhold 1998; Pizzari and Birkhead 2002), speciation (Orr and Coyne 1989; Coyne 1992; Presgraveset al. 2003), and sexually antagonistic fitness variation (Gibsonet al. 2002), traits where adaptive evolution is often invoked. Sex chromosomes also show an unusual complement of sex-specific genes. Human X is enriched for genes with male-specific expression (Saifi and Chandra 1999; Wanget al. 2001; Lercheret al. 2003) whereas Drosophila (Swansonet al. 2001; Betranet al. 2002; Parisiet al. 2003; Ranzet al. 2003) and C. elegans (Reinkeet al. 2000) X's show a significant deficit of genes with male-biased expression and some evidence for an excess of genes with female-biased expression. Reproductive genes usually evolve rapidly under positive selection (Civetta and Singh 1995; Swanson and Vacquier 2002; Torgersonet al. 2002; Swansonet al. 2003) and there is evidence for rapid evolution of male-biased gene expression in Drosophila (Meiklejohnet al. 2003). It remains to be elucidated whether genes with sex-specific functions are overrepresented also on the avian Z chromosome.
Demography: Although selection can obviously shape levels and patterns of genomic variability, demography can too. Moreover, rejection of a neutral model in neutrality tests like Tajima's D (Tajima 1989a) does not necessarily imply that selection has to be invoked; it can as well result from violations of the assumptions of random mating in a population of constant size. For instance, a skewed spectrum of variants toward rare alleles, as indicated by negative values of Tajima's D, is expected from directional selection, from weak purifying selection, and from population expansion (Tajima 1989b; Bravermanet al. 1995; Przeworski 2002). An excess of intermediate-frequency variants (positive Tajima's D values) is expected both from balancing selection and from population bottlenecks (Tajima 1989b; Simonsenet al. 1995; Fay and Wu 1999).
Domestication has recently affected the size and structure of chicken populations, likely involving complex episodes of bottlenecks, population expansion, and subdivision. With such complex demographic history it is unclear what should be expected when it comes to the signature left on patterns of variation. We found average Tajima's D values to be positive for Z-linked and autosomal sequences but not significantly different from a neutral model. One possible interpretation is that there are two countering effects from population history. A recent contraction in population size associated with chicken domestication, indicated by the presence of only a limited number of mitochondrial DNA lineages (Fumihitoet al. 1994), may have led to an excess of intermediate-frequency variants (rare alleles are more readily lost during a reduction in population size). Assuming an island model of population structure with weak migration between demes, population subdivision following from breed formation should also support positive values of Tajima's D (Hammeret al. 2003). However, the breeding population of chicken has expanded significantly after the early days of domestication so there may be many new mutations segregating at rare frequencies. Moreover, our sampling design with few individuals from many different demes should recover more rare alleles than a single large population sample (Ptak and Przeworski 2002; Hammeret al. 2003).
The sign of Tajima's D subsequent to a bottleneck will depend on its severity and the length of time since the bottleneck (Tajima 1989b; Fay and Wu 1999). The fact that, in this study, overall Tajima's D was found to still be positive may reflect that the bottleneck associated with domestication was strong and that relatively limited time has elapsed since population contraction. In any case, we cannot exclude the possibility that the effects of demographic history on allele frequency spectra have masked potential effects from selection.
Conclusions: We favor the idea that a low male effective population size has reduced the levels of polymorphism on the chicken Z chromosome. This could initially have arisen from skewed reproductive success among males in the wild ancestor, but is likely to have been accentuated during chicken breeding. However, as differences in effective population size do not seem to be able to completely explain the observed heterogeneity in polymorphism levels, selection—before, during, or subsequent to domestication—may have been important. Several lines of argument suggest that selective sweeps could be a potent force in shaping Z chromosome variability. In this context it is of interest to note that selection may have a stronger effect on polymorphism levels on Z in species with female heterogamety than on X in systems with male heterogamety, relative to that of the respective autosomes. As suggested by Gibson et al. (2002) and formally demonstrated by Reeve and Pfennig (2003), the process of sexual selection affecting sex-linked loci can be expected to be facilitated in birds as fathers pass a Z chromosome to their sons, improving the heritability and increasing the strength of selection of sexually antagonistic alleles advantageous to males (in systems with male heterogamety, fathers do not pass the X chromosome to sons). There is evidence for frequent Z-linkage of sexually selected characters in birds (Ohno 1967; Saetreet al. 2003) as well as in butterflies (also ZW; Sperling 1994; Iyengaret al. 2002). Moreover, in chicken, several genes likely to have been under directional selection during domestication map to the Z chromosome, e.g., the feathering (Bitgood 1999) and dwarf (Burnsideet al. 1992) loci. Although speculative, the fact that the excess of polymorphism on autosomes compared to X in humans is less pronounced (∼1.6 times more variation on autosomes) than that between autosomes and Z in birds could potentially be related to such a difference in the efficiency of selection.
The samples used in this study were kindly provided by Michèle Tixier-Boichard and the AVIANDIV project. Financial support was obtained from the Swedish Research Council. H.E. is a Royal Swedish Academy Research Fellow supported by a grant from the Knut and Alice Wallenberg foundation.
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AF165971, AF526014, AY139845–7, AY139851, AY139861, AY142943–4, AY144673–6, AY144678–9, AY144681–2, AY189760, AY189776, AY194143, AY194147, AY298982, AY298987, AY298992, AY380786, and AY380788–9.
Communicating editor: N. A. Jenkins
- Received December 29, 2003.
- Accepted January 27, 2004.
- Copyright © 2004 by the Genetics Society of America