Unlike the autosomes, recombination between the X chromosome and the Y chromosome is often thought to be constrained to two small pseudoautosomal regions (PARs) at the tips of each sex chromosome. PAR1 spans the first 2.7 Mb of the proximal arm of the human sex chromosomes, whereas the much smaller PAR2 encompasses the distal 320 kb of the long arm of each sex chromosome. In addition to PAR1 and PAR2, there is a human-specific X-transposed region that was duplicated from the X to the Y chromosome. The X-transposed region is often not excluded from X-specific analyses, unlike the PARs, because it is not thought to routinely recombine. Genetic diversity is expected to be higher in recombining regions than in nonrecombining regions because recombination reduces the effect of linked selection. In this study, we investigated patterns of genetic diversity in noncoding regions across the entire X chromosome of a global sample of 26 unrelated genetic females. We found that genetic diversity in PAR1 is significantly greater than in the nonrecombining regions (nonPARs). However, rather than an abrupt drop in diversity at the pseudoautosomal boundary, there is a gradual reduction in diversity from the recombining through the nonrecombining regions, suggesting that recombination between the human sex chromosomes spans across the currently defined pseudoautosomal boundary. A consequence of recombination spanning this boundary potentially includes increasing the rate of sex-linked disorders (e.g., de la Chapelle) and sex chromosome aneuploidies. In contrast, diversity in PAR2 is not significantly elevated compared to the nonPARs, suggesting that recombination is not obligatory in PAR2. Finally, diversity in the X-transposed region is higher than in the surrounding nonPARs, providing evidence that recombination may occur with some frequency between the X and Y chromosomes in the X-transposed region.
- genetics of sex
- nucleotide diversity
- pseudoautosomal region (PAR)
- X-transposed region (XTR)
- sex chromosome evolution
THE human sex chromosomes, X and Y, were previously an indistinguishable pair of autosomes, but within the last 180–210 million years, the ancestral pair diverged into two distinct chromosomes of tremendously different gene content and function (Mikkelsen et al. 2007; Rens et al. 2007). The human sex chromosomes are composed of an older X-conserved region, shared across all therian (marsupial and eutherian) mammals (Watson et al. 1990; Glas et al. 1999), and a younger X- and Y-added region: an autosomal sequence that was translocated to the X and Y chromosomes in the common ancestor of eutherian mammals approximately 80–130 million years ago (Waters et al. 2001). The differentiation of the X and Y is hypothesized to have occurred after a series of Y-specific inversions that suppressed X-Y recombination (Lahn and Page 1999; Marais and Galtier 2003; Lemaitre et al. 2009; Wilson and Makova 2009; Pandey et al. 2013). In the absence of homologous recombination, the Y chromosome has lost nearly 90% of the genes that were on the ancestral sex chromosomes (Skaletsky et al. 2003; Ross et al. 2005; Sayres and Makova 2013). Today, the human X and Y chromosomes share two pseudoautosomal regions (PARs) at the ends of the chromosomes that continue to undergo homologous X-Y recombination (Lahn and Page 1999). PAR1 spans the first 2.7 Mb of the proximal arm of the human sex chromosomes (Ross et al. 2005) and contains genes from the ancient X- and Y-added region translocation. PAR1 is separated from the nonrecombining (nonPAR) regions on the Y chromosome by a Y-specific inversion that is hypothesized to suppress X-Y recombination at this pseudoautosomal boundary (Pandey et al. 2013). A functional copy of the XG gene spans the human pseudoautosomal boundary on the X chromosome (Yi et al. 2004) but is interrupted on the Y chromosome by a Y-specific inversion (Ellis et al. 1990). In contrast to this mechanism for PAR1 formation, the 320-kb human-specific PAR2 resulted from at least two duplications from the X chromosome to the terminal end of the Y chromosome (Charchar et al. 2003).
Genes located in PAR1 have important functions in all humans. Although genes on one X chromosome in 46,XX individuals are silenced via a process called X-inactivation (Carrel and Willard 2005), which evolved in response to loss of homologous gene content on the Y chromosome (Wilson Sayres and Makova 2013), all 24 genes in PAR1 escape inactivation (Perry et al. 2001; Ross et al. 2005; Helena Mangs and Morris 2007) (Supplemental Material, Table S1). For example, one gene in PAR1, SHOX1, plays an important role in long bone growth and skeletal formation (Rao et al. 2001; Benito-Sanz et al. 2012; Tsuchiya et al. 2014). The consequences of SHOX1 disruption include short stature, skeletal deformities, Leri-Weill syndrome, and phenotypes associated with Turner syndrome (45,X) (Rao et al. 2001). ASMT, another gene located in PAR1, is involved in the synthesis of melatonin and is thought to be connected with psychiatric disorders, including bipolar affective disorder (Flaquer et al. 2010).
The suggested function of the PARs is to assist in chromosome pairing and segregation (Kauppi et al. 2011). It has been proposed, in humans and in great apes, that crossover events are mandatory during male meiosis (Rouyer et al. 1986; Lien et al. 2000; Kauppi et al. 2012). Analyses of human sperm suggest that a deficiency in recombination in PAR1 is significantly correlated with the occurrence of nondisjunction and results in Klinefelter syndrome (47,XXY) (Shi et al. 2002). Deletions in PAR1 are shown to lead to short stature, which is correlated with Turner syndrome (Rao et al. 1997). Further, the male sex-determining gene on the Y chromosome (SRY) is proximal to PAR1 on the short arm of the Y chromosome. SRY can be translocated from the Y to the X during incongruent crossover events between the paternal PAR1s, resulting in SRY+ XX males (Page et al. 1985) or, more rarely, true hermaphroditism (Abbas et al. 1993). The chances that XX individuals will inherit a copy of the SRY gene during male meiosis are restricted by reduced recombination at the PAR1 boundary (Fukagawa et al. 1996).
Previous studies estimate that the recombination rate is ∼20 times the genome average in PAR1 (Lien et al. 2000) and ∼5 times the genome average in PAR2 (Filatov and Gerrard 2003), likely because recombination events in XY individuals are restricted to the pseudoautosomal sequences, with the exception of possible gene conversion in regions outside the PARs (Rosser et al. 2009). In addition to PAR1 and PAR2, where recombination is known to occur between the X and Y chromosomes, there is an X-transposed region (XTR) that was duplicated from the X to the Y chromosome in humans after human-chimpanzee divergence (Skaletsky et al. 2003; Ross et al. 2005). Already, the XTR has incurred several deletions and an inversion, but it maintains 98.78% homology between the X and Y (Ross et al. 2005) and retains two genes with functional X- and Y-linked homologs (Skaletsky et al. 2003). Genetic diversity is expected to be higher in the PARs than in the remainder of the sex chromosomes for several reasons. First, recombination can unlink alleles affected by selection from nearby sites, reducing the effects of background selection and genetic hitchhiking on reducing genetic diversity (Vicoso and Charlesworth 2006; Charlesworth 2012). Second, the effective size of the PARs of the sex chromosomes should be larger (existing in two copies in all individuals) than the nonrecombining region of the X chromosome, which exists in two copies in genetic females and only one copy in genetic males. Finally, genetic diversity may be higher in PARs than in regions that do not recombine in both sexes if recombination increases the local mutation rate (Perry and Ashworth 1999; Hellmann et al. 2003; Huang et al. 2005).
Studies of human population genetic variation often compare diversity on the X chromosome with diversity on the autosomes to make inferences about sex-biased human demographic history (Hammer et al. 2008; Gottipati et al. 2011b; Arbiza et al. 2014). Typically, PAR1 and PAR2 are filtered out of these studies, at the defined pseudoautosomal boundaries, and the XTR is not filtered out. However, patterns of diversity across the entire human X chromosome, including transitions across the PARs and XTR, have not been investigated to justify these common practices. In this study, we investigate patterns of genetic diversity and divergence across the entire human X chromosome.
Materials and Methods
We analyzed X chromosomes from 26 unrelated (46,XX) individuals sequenced by CompleteGenomics (Drmanac et al. 2010) (Table S2). Sites were filtered, requiring that data be present (monomorphic or variable) in all 26 samples. Human-chimpanzee (hg19-panTro4), human-macaque (hg19-rheMac3), human-dog (hg19-canFam3), and human-mouse (hg19-mm10) alignments were extracted from the University of California Santa Cruz (UCSC) Genome Browser (Rosenbloom et al. 2015). We curated the human-chimpanzee and human-macaque alignments to filter out segments that included autosomal sequences aligning to the X chromosome (Table S3, Figure S1, and Figure S2). These alignments were visualized using Gmaj software (Blanchette et al. 2004). Additionally, we observed several regions across the X chromosome that exhibited heightened divergence between the human and chimpanzee or the human and macaque (Figure S3 and Figure S4). On further inspection, these regions often contain multicopy gene families that could lead to mismapping (Table S3). Divergence estimates were similar with and without these regions, and here we present results with these regions of high divergence near multicopy gene families excluded. Low-diversity (Dutheil et al. 2015) and ampliconic regions (Mueller et al. 2013; Nam et al. 2015) were filtered out of the data to avoid analyzing regions potentially affected by strong selective sweeps or difficult-to-align regions. Significant differences between PAR1 and nonPARs of the X chromosome, as well as significant differences between the XTR and nonXTRs of the X chromosome persist regardless of inclusion or exclusion of ampliconic and low-diversity regions (Figure 1, Table 1, Table S4, and Table S5).
We used Galaxy Tools (Blankenberg et al. 2011) to filter out regions that could cause potential sequence misalignments and regions defined by the UCSC Genome Browser (Rosenbloom et al. 2015) that may be subject to selection: Reference Sequence (RefSeq) database genes, simple repeats, and repetitive elements. We attempted to filter out noncoding regions near genes, but doing so would leave very little analyzable sequence in PAR1 and PAR2.
We measured the diversity between the sequences as π, the average pairwise nucleotide differences per site between all sequences in the sample:where L represents the number of called sites, k represents the number of DNA sequences, pi and pj are the frequencies of the corresponding alleles i and j, and dij is the number of sites containing nucleotide differences. Diversity was calculated within each specific region (PAR1, PAR2, XTR, nonPARs with XTR, and nonPARs without XTR), as well as across sliding and nonoverlapping windows. We generated window-interval files across the human X chromosome with Galaxy Tools (Blankenberg et al. 2011) and conducted analysis in four sets of windows: (1) in a 1 Mb nonoverlapping window, (2) a 1 Mb window with 100 kb sliding start positions, (3) a 100 kb nonoverlapping window, and (4) a 100 kb window with 10 kb sliding start positions (Figure S5). We similarly calculated human-chimpanzee, human-macaque, human-dog, and human-mouse species divergence along the X chromosome in each of the four regions and in the same windows described previously. To normalize the data, π values were divided by the observed divergence within the same interval.
Chromosome X was divided into windows that were permuted without replacement 10,000 times to assess significant differences between diversity in each region (PAR1, XTR, and PAR2) relative to nonPAR sequences. This analysis was repeated for uncorrected diversity and diversity corrected for human-chimpanzee, human-macaque, human-dog, and human-mouse divergence values. Empirical P-values were calculated by computing the number of times the difference between each pair of permuted sample regions was equal to or greater than the difference in observed diversity between each pair of regions. The negative correlation along the pseudoautosomal boundary was tested using linear regressions across 100 kb windows covering a total of 3 Mb for each regression (30 windows), shifting the window by 100 kb systematically (Figure 2). Each regression was analyzed for significance of the correlation (P < 0.05), with all data points occurring before the first nonsignificant window being included in the significant data set. The 100 kb nonoverlapping windows were permuted 10,000 times, and the correlation coefficient and the P-values of the linear regression were calculated for the first 3 Mb of each permutation. The significance of the observed negative correlation was computed by comparing the 10,000 permuted linear regressions with the observed value. All the graphs were produced using R version 3.1.2 (R Core Team 2015).
The authors state that all data necessary for confirming the conclusions presented in this article are represented fully within the article. All codes used for this project can be found at https://github.com/WilsonSayresLab/PARdiversity.
Human X-linked nucleotide diversity is high in PAR1 but not PAR2
We observe that uncorrected diversity is three times higher in PAR1 than in the nonPARs, whereas uncorrected diversity in PAR2 is not significantly greater than that in the nonPARs (Table 1, Figure 1, and Figure 3). We studied noncoding regions across the entire X chromosome, filtering out annotated genes, to minimize the effect of selection, but given their small sizes, we could not filter out regions far from genes in the PARs or XTR (see Materials and Methods). Ampliconic regions (Mueller et al. 2013; Nam et al. 2015), as well as regions of low diversity that are expected to have strong selective sweeps (Dutheil et al. 2015), also were filtered out, which yielded the same result (Table S4). However, mutation-rate variation across the X chromosome may account for variable levels of diversity observed in the PARs and nonPARs. We normalized the nucleotide diversity to correct for mutation rate using pairwise divergence between humans and several different species: panTro4, rheMac3, canFam3, and mm10 (Table 1 and Figure S6). When we normalized with panTro4, the difference in diversity between PAR1 and nonPARs was not significant after filtering out the ampliconic regions, low-diversity regions, and the “not applicable” (NA) values. This could be a result of large variation in divergence across regions of the X chromosome between humans and chimpanzees, potentially owing to complex speciation events (Patterson et al. 2006). Given this phenomenon, we focus our interpretations on data that have been normalized using human-macaque divergence. Similar to the uncorrected diversity values, when we correct for mutation rate using macaque divergence values, we observe higher nucleotide diversity across humans in PAR1 and PAR2 relative to the nonPARs, with diversity being significantly higher in PAR1 than in nonPARs (with XTR removed) and not significantly different between PAR2 and nonPARs (Figure 1, Figure 3, and Table 1).
Curiously, human-chimpanzee and human-macaque divergence are quite high in PAR1 relative to the nonPARs in a pattern that does not reflect diversity (Figure 1 and Table 1). This result, predominantly, is due to high interspecies divergence in PAR1 and near the PAR boundary (Figure S3 and Figure S4). However, human-dog divergence roughly parallels uncorrected human diversity (Figure 1). Alignments between the human and the mouse in PAR1 are unavailable.
Further, significantly elevated diversity in PAR1 relative to the nonPARs cannot be attributed solely to mutation-rate variation across the X chromosome because the pattern remains after correction for divergence in each region (Figure 1 and Table 1). The pattern we observed is consistent with several processes, including selection reducing variation more at linked sites in the nonPARs than in PAR1 as a result of reduced rates of recombination in the nonPARs relative to the PARs or as a result of stronger drift in the nonPARs as a result of a smaller effective population size.
That we do not observe significantly elevated diversity in PAR2 relative to the nonPARs is consistent with reports that PAR2 undergoes X-Y recombination less frequently than PAR1 (Flaquer et al. 2008) and supports assertions that in humans only one chiasma per chromosome is needed for proper segregation rather than one per chromosome arm (Fledel-Alon et al. 2009).
Diversity is significantly higher in the XTR than in the nonPARs
Curiously, in addition to elevated rates of diversity in the previously described PAR1 and PAR2, we also observed that diversity was significantly higher in the recent XTR than in the nonPARs (Table 1 and Figure 3). This increased diversity cannot be attributed to mismapping between the X and Y chromosome because we only analyzed individuals with two X chromosomes (see Materials and Methods). High diversity in the XTR contrasts with initial suggestions that there is no X-Y recombination in the XTR (Skaletsky et al. 2003) and is consistent with recent reports of X-Y recombination in some human populations in this region (Veerappa et al. 2013).
Given the large size of the nonPARs and the small size of the XTR, 5 Mb (Ross et al. 2005), one may wonder whether removing the XTR would make a difference to measured levels of diversity across the human X chromosome. The raw diversity of the nonPARs including the XTR, measured as π, is 0.000602, while the raw diversity of the nonPARs excluding the XTR is 0.000595 (Table 1). Removal of the XTR does decrease estimates of both diversity and divergence in the nonPARs. Although the XTR de facto may be removed with other filters, one should be cautious to include XTR regions because their inclusion in studies of X-specific diversity will affect inferences made when comparing X-linked and autosomal variation (Keinan and Reich 2010; Gottipati et al. 2011a; Wilson Sayres et al. 2014; Arbiza et al. 2014).
Pseudoautosomal boundaries cannot be inferred from patterns of diversity
Recombination between the X and Y chromosomes is expected to be suppressed at the pseudoautosomal boundary, where X-Y sequence homology diverges owing to a Y-specific inversion (Ellis et al. 1990; Yi et al. 2004; Pandey et al. 2013). If diversity correlates highly with recombination rate and X-Y recombination is strictly suppressed in the nonPARs after the pseudoautosomal boundary, then diversity is expected to drop sharply between PAR1 and the nonPARs. However, when we analyze patterns of human diversity in permuted windows across the X chromosome (see Materials and Methods), we do not observe an abrupt shift in the level of diversity between PAR1 and the nonPARs (Figure 2). The lack of an observable pseudoautosomal boundary based on diversity is clear whether small or large (100 kb or 1 Mb) or overlapping or nonoveralapping windows are used (Figure S5). In the approximately 3 Mb that span the pseudoautosomal boundary, we observe a significant negative correlation between distance from Xp and diversity. As we shift the window for the regression by 100 kb further from the start of PAR1, we observe that the negative correlations remain independently significant and continue past the boundary (Figure 2). We observe that the original linear relationship between distance from Xp and diversity has a significant negative coefficient of correlation (R = −0.6681177; P = 0) (Figure S7). The significant linear relationship (P = 3.281 × 10−10) that we observe in Figure 2 extends nearly twice the length of PAR1 and supports the observation that there is no clear, abrupt drop in nucleotide diversity across the pseudoautosomal boundary. To test the significance of this correlation, we conducted a permutation test, shuffling windows (of 100 kb) across the X chromosome and recomputing the series of linear regressions 10,000 times; then we computed the number of times a permuted X chromosome had a correlation that was as strong as or stronger than what we observed on the X chromosome (Figure S7). We found that the negative correlation between distance from the short arm of the X chromosome and diversity is significant and spans the pseudoautosomal boundary (see Materials and Methods; P = 0, permutation test).
The history of gene conversion between the sex chromosomes may contribute to the increased diversity levels (Trombetta et al. 2014) on the nonPAR side of the Y-specific inversion that marks the pseudoautosomal boundary. Human diversity uncorrected for divergence decreases from the proximal end of PAR1 through the pseudoautosomal boundary and well into the nonPAR. A sex-specific map of PAR1 found that male recombination is higher near the telomeres and decreases near the pseudoautosomal boundary, while, in contrast, the female recombination rate reported in the same study in PAR1 is fairly flat throughout the region and increases near the pseudoautosomal boundary (Hinch et al. 2014). Thus, genetic diversity uncorrected for divergence in PAR1 appears to correlate with the male recombination rate. Curiously, however, a previous study of recombination rate in PAR1 reported an increase in the female (but not the male) recombination rate near the proximal end of PAR1 (Henke et al. 1993). Thus, potentially, both male and female recombination rates contribute to the linear decrease in diversity observed in PAR1 from the proximal end of the X chromosome through the pseudoautosomal boundary. Although not yet mapped, when the data becomes available, it will be useful to compare patterns of diversity with sex-specific recombination maps across the entire X chromosome.
We show that diversity is indeed higher in the pseudoautosomal regions and lower in the regions of the X chromosome that are not known to recombine in males (nonPARs). Diversity in PAR1 is significantly higher than in the nonPARs regardless of normalizing the diversity with divergence between human and either macaque or dog to correct for mutation rate (Table 1, Figure 1, and Figure 3). Diversity also was normalized with divergence from the mouse, but there is no alignment between human and mouse in PAR1 because of a different evolutionary origin in PAR1 and no common pseudoautosomal genes being shared between them (Gianfrancesco et al. 2001). We observed that diversity is lower in PAR2 than expected and is not significantly different from the nonPARs. We also showed that diversity is elevated in the XTR above other nonPARs, verifying recent observations that the region still may undergo homologous recombination between the X and Y chromosomes (Veerappa et al. 2013). Finally, when analyzing patterns of genetic diversity in windows across the human X chromosome, we found that there is no strict boundary, based solely on the levels of diversity, between the recombining and putatively nonrecombining regions, which could be attributed to the evolutionary shift in the pseudoautosomal boundary over time, extending PAR1 as a result of a PAR1 length polymorphism (Mensah et al. 2014). This also could suggest that nonhomologous recombination at the pseudoautosomal boundaries may be common.
Our observations of patterns of diversity across regions of the human X chromosome with variable levels of recombination are consistent with previous reports that diversity and divergence are correlated with recombination rate in humans across the genome (Hellmann et al. 2003) and explicitly in PAR1 (Bussell et al. 2006). Elevated levels of diversity in the XTR suggest that, consistent with a recent report (Veerappa et al. 2013), this region may frequently undergo X-Y recombination. Curiously, we did not find a significant elevation of diversity in PAR2, which, in agreement with its unusual evolution (Charchar et al. 2003), indicates that it rarely recombines between X and Y chromosomes during meiosis. Further, the lack of a clear differentiation in diversity between PAR1 and the nonPARs suggests that recombination suppression between the X and Y chromosomes is still an actively evolving process in humans, as in other species (Bergero and Charlesworth 2009). This is consistent with evidence that the position of the pseudoautosomal boundary varies across mammals (Raudsepp and Chowdhary 2008; Otto et al. 2011; Raudsepp et al. 2012; White et al. 2012). There is even evidence of polymorphism in the pseudoautosomal boundary in a pedigree analysis of a paternally inherited X chromosome in humans (Mensah et al. 2014). Recombination spanning the pseudoautosomal boundary may account for some cases of de la Chapelle syndrome (Schrander-Stumpel et al. 1994), in which an individual with two X chromosomes develops male gonads, and some portion of cases also have a copy of SRY (SRY sits immediately proximal to the pseudoautosomal boundary in humans). Further, it is possible that pseudoautosomal boundaries vary across populations, affecting recombination and contributing to nondisjunction of the sex chromosomes. Taken together with previous inferences about the variation in pseudoautosomal boundaries, our observations suggest that assumptions should not be made of a strict suppression of X-Y recombination at the proposed human pseudoautosomal boundary.
We thank the School of Life Sciences and the Biodesign Institute at Arizona State University for startup funding to M.A.W.S. This research was supported in part by funds from the School of Life Sciences Undergraduate Research Program (SOLUR) through the School of Life Sciences at Arizona State University, Tempe Campus. We thank the SOLUR Program and the College of Liberal Arts and Sciences Undergraduate Summer Enrichment program for support to S.M.B.
Communicating editor: J. M. Akey
Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.114.172692/-/DC1.
- Received December 6, 2015.
- Accepted March 11, 2016.
- Copyright © 2016 by the Genetics Society of America
Available freely online through the author-supported open access option.