Signatures of Sex-Antagonistic Selection on Recombining Sex Chromosomes
- 1Corresponding author: Department of Integrative Biology, C-0990, University of Texas, Austin, TX 78712. E-mail: kirkp{at}mail.utexas.edu
Abstract
Sex-antagonistic (SA) selection has major evolutionary consequences: it can drive genomic change, constrain adaptation, and maintain genetic variation for fitness. The recombining (or pseudoautosomal) regions of sex chromosomes are a promising setting in which to study SA selection because they tend to accumulate SA polymorphisms and because recombination allows us to deploy the tools of molecular evolution to locate targets of SA selection and quantify evolutionary forces. Here we use coalescent models to characterize the patterns of polymorphism expected within and divergence between recombining X and Y (or Z and W) sex chromosomes. SA selection generates peaks of divergence between X and Y that can extend substantial distances away from the targets of selection. Linkage disequilibrium between neutral sites is also inflated. We show how the pattern of divergence is altered when the SA polymorphism or the sex-determining region was recently established. We use data from the flowering plant Silene latifolia to illustrate how the strength of SA selection might be quantified using molecular data from recombining sex chromosomes.
- coalescent
- model
- neutral polymorphism
- genetics of sex
- sex chromosome
THE recombining regions of sex chromosomes offer a unique window onto important genomic processes (Bachtrog et al. 2011; Otto et al. 2011). Several major genetic model systems have highly reduced recombination between the sex chromosomes (Ezaz et al. 2006). Recombination is confined to a small region of the X and Y in mammals and a small region of the Z and W in birds and Lepidoptera. In many animals and plants with genetic sex determination, however, the recombining pseudoautosomal region (PAR) of the sex chromosomes comprises a large part their physical maps (Otto et al. 2011). The most extreme situation imaginable is found in fugu. In this fish, sex is determined by a polymorphism at a single nucleotide on a chromosome that appears to be recombining normally (Kamiya et al. 2012).
Regardless of whether it is small (as in mammals) or spans the entire sex chromosome (as in fugu), the PAR is of particular interest for several reasons (reviewed by Otto et al. 2011). First, loci in the PAR have transmission patterns that are neither fully sex linked nor fully autosomal. This can cause alleles to spend unequal amounts of time in males and females, subjecting them to sex-specific evolutionary forces (e.g., selection and mutation). Second, in some taxa the PAR is the most dynamic region of the genome, showing high rates of chromosomal rearrangement (Volff and Schartl 2001). Third, the PAR provides an analog for the ancestral condition of nonrecombining regions of sex chromosomes. The PAR can give insights into the processes that lead to reduced recombination and degeneration of Y and W chromosomes. Fourth, sex chromosomes in some groups are young. For example, several genera of fish have species that are separated by only a few million years but whose sex chromosomes correspond to different linkage groups (Tanaka et al. 2007; Ross et al. 2009; Kamiya et al. 2012). These sex chromosomes continue to recombine over much of their length, and patterns of variation in their PAR might be used to test hypotheses for how new sex chromosomes originate. That approach is not possible in model systems such as mammals, birds, and Drosophila because degeneration of their ancient sex chromosomes has obliterated much of their history.
A final motivation for focusing attention on recombining sex chromosomes involves sex-antagonistic selection, the situation in which alleles have different fitness effects in females and males. Sex-antagonistic (SA) selection is believed to be the force that causes the evolution of reduced recombination between X and Y (and between Z and W) (Charlesworth and Charlesworth 1980; Bull 1983; Rice 1987) and can drive the shift of sex determination from one pair of chromosomes to another (van Doorn and Kirkpatrick 2007, 2010). SA selection can also constrain adaptation, since increasing fitness in one sex is tied to reduced fitness in the other (van Doorn 2009; Rice 2013). There is growing evidence that SA selection is taxonomically ubiquitous (Arnqvist and Rowe 2005; Cox and Calsbeek 2009; van Doorn 2009). It may also be widespread in the genome: one study estimated that 8% of loci in a laboratory population of Drosophila melanogaster show evidence of SA selection (Innocenti and Morrow 2010).
Theory shows that sex chromosomes accumulate SA polymorphisms over a much broader range of conditions than do autosomes, and so they are expected to be hotspots for genes under sex-antagonistic selection (Rice 1984; Clark 1988; Connallon and Clark 2010; Fry 2010; Jordan and Charlesworth 2012). Indeed, there is evidence from laboratory populations of Drosophila that sex chromosomes carry a disproportionately large fraction of the SA variation in fitness (Gibson et al. 2002; Pischedda and Chippindale 2006; Connallon and Jakubowski 2009).
Recombining sex chromosomes offer the opportunity to use the tools of molecular population genetics to localize and estimate the intensity of selection acting on SA loci, a strategy that is not possible when the Y (or W) is transmitted as a single nonrecombining block. Qiu et al. (2013) pointed out that neutral polymorphisms show elevated divergence between X and Y chromosomes in regions of the PAR near to loci under SA selection, much as local adaptation generates divergence between geographic populations (Charlesworth et al. 1997), and they used that analogy to identify loci that show evidence of a “footprint” of SA selection. This population-genetic strategy complements more indirect inferences about sex-antagonistic selection, for example based on sex-biased gene expression (Mank 2009; Parsch and Ellegren 2013).
These considerations have motivated a recent burst of interest in recombining sex chromosomes (Bachtrog et al. 2011; Otto et al. 2011). To exploit their unique evolutionary genetics, models are needed to make quantitative predictions about the patterns of neutral DNA polymorphism in the PAR when sex-antagonistic selection is acting. In an earlier study, we studied coalescence in the PAR under a variety of scenarios about selection and evolutionary history, but did not consider the important case of SA selection (Kirkpatrick et al. 2010).
This article fills that gap. Here we use coalescent models to provide predictions for the amount of neutral polymorphism within chromosomal types (X and Y or Z and W) and the divergence between them. We find that there are signatures of sex-antagonistic selection. Divergence between X and Y, measured as FST, peaks around a locus under SA selection. Linkage to the sex-determining region can cause the peak to be much broader than a peak of diversity at an autosomal locus under balancing selection. We sketch out how neutral divergence might be used to estimate the strength of sex-antagonistic selection. Finally, we explore the patterns of neutral variation when either the sex-antagonistic polymorphism or the sex chromosome itself is relatively young.
Models and Results
Our model concerns sex chromosomes, by which we mean the pair of chromosomes that carry the sex-determining region. The model is of three loci. The first determines sex. We refer to it as the “sex-determining region” (SDR) and denote it in equations with the subscript S. This locus segregates for two alleles, X and Y, which could consist physically of a single sex-determining locus or of a large nonrecombining region. We refer to chromosomes that carry allele X as X chromosomes and those that carry allele Y as Y chromosomes. Our model also applies to female heterogamety (that is, ZW sex determination) if one reverses the names of the sexes, replaces the Y chromosome by the W, and replaces the X chromosome by the Z.
The second locus is the target of sex-antagonistic selection. It is denoted A, and it segregates for two alleles, A1 and A2. Because the alleles have different effects in females and males, their frequencies are different on X and Y chromosomes. We assume that mating is random and so the state of the population at locus A can be described by three frequencies: the frequency of allele A1 among Y chromosomes in sperm, its frequency among X chromosomes in sperm, and its frequency among X chromosomes in eggs. In most calculations we simplify further by assuming that sex differences in selection are sufficiently small that allele frequencies at locus A on X chromosomes in eggs and X chromosomes in sperm are approximately equal. The population can then be described by the frequencies of A1 on X chromosomes and on Y chromosomes, which are denoted x and y. Note that these allele frequencies can be very different under our assumption that frequencies on X chromosomes in eggs and sperm are nearly equal.
In our basic coalescent model, x and y are viewed as parameters and no specific assumptions are made about how selection acts. It is plausible that SA selection is often frequency dependent, which could in principle produce any values of x and y. Alternatively, fixed fitnesses with sex-specific effects, possibly overdominant, could be at work. We emphasize that our model does not rely on the assumptions about the form of selection that have been used in previous models of SA selection (Rice 1984; Clark 1988; Connallon and Clark 2010; Fry 2010; Jordan and Charlesworth 2012). (The next section explains that the values of x and y used in the figures correspond to the equilibria for a simple form of SA selection with constant fitnesses, but the same values could also result from other types of selection.) Selection is assumed sufficiently strong relative to drift that x and y can be regarded as constant.
We start by assuming that the polymorphism at locus A and the SDR are both infinitely old. Our results should apply approximately when the SA polymorphism and sex-determination systems are >>N generations old, where N is the population size. Intuitively, we expect systems with old SA polymorphisms and SDRs to show the strongest patterns, so results for this case can be viewed as most favorable for detecting departures from neutrality. We later return to the question of how the results are affected by the age of the SA polymorphism and the sex-determining region.
A useful way to visualize some of the results that follow is in terms of the divergence of X and Y chromosomes at different points along the PAR. A convenient measure of this divergence is FST. This statistic is commonly used to measure differentiation between geographic populations (Charlesworth et al. 2003), but it can also be used to quantify divergence between groups of chromosomes or between the sexes (Bengtsson and Goodfellow 1987). The divergence between X and Y chromosomes at locus A is (1)where
=
and x and y are the frequencies of allele A1 on X and Y chromosomes.
The third locus is a neutral site (that is, nucleotide base) whose genealogy is our focus. This site is denoted i. Our study is motivated by questions about the pattern of neutral genetic variation near a locus under sex-antagonistic selection. We therefore assume throughout that i and A are tightly linked, specifically that the genetic recombination rates between i and A in females and males, and
(measured in morgans), are <<1. The population recombination rate between i and A in males is written
and the population size N is assumed to be constant. Other population recombination rates are denoted in an analogous way; for example, the recombination rate between the SDR and locus A in females is
We use the term “gene” to refer to a specific copy of the DNA at site i. Thus two genes can be identical or can differ in their sequence. (We say gene rather than “allele” because two alleles must differ.) A gene is carried on one of four genetic backgrounds determined by the genotypes at the SDR and locus A, and we denote these four backgrounds as X1, X2, Y1, and Y2. Patterns of neutral genetic variation within and between X and Y chromosomes depend on the mean coalescence times between pairs of genes sampled from these different genetic backgrounds. A pair of genes at site i can have 10 possible combinations of backgrounds: (X1, X1), (X1, X2), (X2, X2), (Y1, Y1), etc.
We assume that recombination and coalescence are sufficiently rare that there is a negligible probability that more than one event happens in a given generation to the ancestors of the genes in the sample. This leads to Kingman’s (1982) approximation for the coalescent process. Our main interest lies in the expected levels of neutral variation within and divergence between X and Y chromosomes. The expected neutral diversity is proportional to the expected coalescence times, and so those times are the main focus of what follows. Time is measured in units of 2N generations.
This article is based on methods that are standard in theoretical population genetics. There is, however, a large amount of algebra involved. We therefore relegate all of the derivations to supporting information, File S1, which also gives results that are more general (and complicated) than those shown below.
The general model
We begin our analysis with the general case in which no assumption is made about the relative strengths of selection and recombination.
The distribution of coalescence times for a pair of genes at site i depends on two kinds of quantities. The first is the transition rates backward in time between the 10 states that a pair of genes can assume. These can be calculated from the population recombination rates (the ρ-values), using equation 2 of Kirkpatrick et al. (2010). Second are the coalescence rates for pairs of genes that are in the same genetic background. These transition and coalescence rates are given in File S1.
Given those rates, the expected coalescence times can be calculated analytically using standard methods. (See Wakeley 2008 for an overview and Kirkpatrick et al. 2010 for applications to recombining sex chromosomes.) A lot of algebra is involved, and so we wrote a Mathematica (Wolfram Research 2012) notebook that automates the calculations. The program is in File S1 and is available for download from http://www.sbs.utexas.edu/kirkpatrick_lab/. The results give the expected time for each of the 10 possible pairs of genetic backgrounds we can have in the sample, for example (X1, X1). The expressions are too large to reproduce here, but they are available in the Mathematica notebook.
We used coalescent simulations to check the analytic calculations. We simulated the process without the approximations used in the analytic model. The backward-time process was simulated one generation at a time and accounted for the sex of individuals so that recombination and coalescence occurred correctly. The simulations allowed allele frequencies on X chromosomes in eggs and sperm to differ and accounted explicitly for how sex-antagonistic selection alters the effective rate of recombination. The simulations therefore provide a check on the accuracy of the analytic approximations. They produce the ancestral recombination graph (Griffiths and Marjoram 1996) for the chromosomal region between the SDR and locus A. We then used the graph to obtain coalescent times at sites within the simulated region. The following results are the means of 105 independent simulations. The simulation code, written in C++, is also available on the Web site cited above.
As explained in the Introduction, the motivation behind this work is to develop tools that can be used to detect loci that are under sex-antagonistic selection. In some study systems it is possible to collect phased data on DNA polymorphisms from X and Y chromosomes. It is therefore of interest to understand the coalescence times of pairs of genes sampled from X and Y chromosomes when there is an unknown locus A also acting on the system. The expected coalescence time for a pair of genes sampled from X chromosomes, is found by taking the average of the coalescence times for a pair of genes with backgrounds (X1, X1), (X1, X2), and (X2, X2), weighting those three times by their probabilities of occurring in the sample (which are determined by the frequencies of alleles A1 and A2 on X chromosomes). Analogous calculations give
the expected coalescence time when one gene is sampled from an X and the other from a Y chromosome, and
the expected time when both genes are sampled from Y chromosomes.
A typical result is shown in Figure 1. For comparison, Figure 1 also shows the results when SA selection is absent. The patterns result from a “structured coalescent” process (Charlesworth et al. 1997; Nordborg 1997). Going backward in time, a gene moves by recombination between the four genetic backgrounds defined by the two types of sex chromosomes and two alleles at locus A. This process is similar to coalescence in a geographically structured population. In that situation, genes move between demes by migration, while in ours genes move between populations of X and Y chromosomes by recombination.
Expected coalescence times between pairs of genes sampled from X and Y chromosomes. Left and center, the regions flanking the SDR; right, the region surrounding locus A. Solid curves show the analytic results when SA selection acts on locus A (shown by the vertical line in the right panel). Shaded curves are the corresponding cases with no SA polymorphism. Circles are simulation results. The frequencies of allele A1 on X and Y chromosomes are x = 0.845 and y = 0.155.
In a neutrally evolving PAR, the expected coalescence time for a pair of genes sampled from Y chromosomes at a site near the SDR is decreased relative to that for genes sampled at an autosomal site (Figure 1). The decrease is greater for sites nearer to the SDR, and it results from the smaller effective population size of genes on the Y. In contrast, coalescence times are increased when both genes are sampled from X chromosomes or when one gene is sampled from an X and the other from a Y (Kirkpatrick et al. 2010).
With sex-antagonistic selection acting, those departures from the autosomal case are exaggerated (compare the solid and shaded curves in Figure 1). Near locus A (Figure 1, right), the expected coalescence times for all three types of gene pairs are greatly inflated over the expected value for neutrally evolving autosomes (which is = 1). This predicts elevated nucleotide polymorphism within X chromosomes and within Y chromosomes in the regions flanking a locus under SA selection. We also see that when pairs of genes are sampled from Y chromosomes, the coalescence times are depressed relative to the autosomal expectation over most of the region between the SDR and locus A. This region of shortened coalescence times near the SDR is greatly enlarged relative to situations with no SA selection (Kirkpatrick et al. 2010). In essence, the region of the Y between the SDR and a site under SA selection has a lower effective population size. That will reduce the amount of neutral variation, and it is also expected to magnify the effects of drift on selected alleles in this region.
Figure 1 also shows that the peak of inflated coalescence times near the site under SA selection is asymmetric. Near locus A, the region proximal to the SDR has deeper coalescence times than the distal region (in Figure 1, respectively to the left and the right of ρ = 100). The asymmetry increases with the strength of sex-antagonistic selection, specifically how much the allele frequencies at locus A differ on X and Y chromosomes (see below).
A final conclusion from Figure 1 is that the agreement between the analytic model (shown in the curves) and the exact simulations (the circles) is very good. This suggests the approximations used in the analytic model are quite accurate.
A useful way to visualize the results is in terms of FST for genes sampled from X and Y chromosomes. In a structured population, the expected value of FST is a simple function of coalescence times: where
is the expected coalescent time for two genes sampled within the same subpopulation and
is the expected time for two genes randomly sampled from the total population (Slatkin 1991). In the present case, the subpopulations are X and Y chromosomes, so
and
We refer to FST as measuring the “divergence” between X and Y. It should be understood that this is a relative measure of within-chromosome vs. between-chromosome divergence and that it does not capture all of the information inherent in the mean coalescence times.
Figure 2 (top) shows expected values for FST between X and Y chromosomes calculated by our analytic method. In the three cases shown, there is strong to very strong sex-antagonistic selection acting on locus A. The values of x and y correspond to equilibria under the following simple scenarios about sex-antagonistic selection. SA selection is symmetric and there is no dominance so that the relative fitnesses of genotypes A1A1, A1A2, and A2A2 are 1 :: :: (1 – s) in females, while conversely they are
::
:: 1 in males. The genetic recombination rate in both females and males between locus A and the SDR is rSA = 0.01, the corresponding population recombination rate is
and the population size is N = 2500. Cases 1, 2, and 3 use the equilibrium values for the allele frequencies x and y (found numerically using the recursion equations of Clark 1988) with s = 0.02, 0.04, and 0.1. For comparison, the curve for a neutral model with no SA selection is virtually indistinguishable from case 1 in the vicinity of the SDR, but has no secondary peak near locus A.
Expected divergence with sex-antagonistic selection on a site at ρ = 100. Top, FST between X and Y chromosomes; bottom, FST between sex chromosomes sampled from males and females. The frequencies of allele A1 on X and Y chromosomes (that is, x and y) are case 1 = (0.618, 0.381), case 2 = (0.845, 0.155), and case 3 = (0.912, 0.082).
Figure 2 shows that a peak of FST occurs at locus A, and its height is given by Equation 1. We also see that the region of elevated FST can extend dozens of ρ’s from both the SDR and locus A, while it extends only a few ρ’s from the SDR in the absence of sex antagonism. With simple balancing selection at an autosomal locus, the expected coalescence times at linked sites are likewise elevated over a region only a few ρ’s wide (Hudson and Kaplan 1988). Thus the effects of the SDR and a locus under SA selection can reinforce each other to produce a much broader region of high divergence than is seen in other genetic situations. As one would expect, this pattern is strongest when SA selection is intense.
In many study systems, current technology does not allow the genetic data from diploid individuals to be easily phased, and so it is not possible to know whether a particular sequence sampled from a male came from his X or his Y chromosome. One can, however, hope to detect sex-antagonistic selection by looking for divergence between genes sampled from males and females, as those differences result from divergence between X and Y. We can calculate the expected value of FST between males and females, using our coalescence results as outlined above.
Figure 2 (bottom) shows the results. The qualitative patterns are the same: increasing sex-antagonistic selection leads to increased divergence, and regions of elevated divergence can extend quite far from the SDR and the locus under sex-antagonistic selection. The key difference is that FST values are much smaller than between X and Y chromosomes. For example, the values of FST between males and females in case 3 are about one-quarter of those between X and Y chromosomes. A small calculation shows that the ratio of FST for X and Y to FST for males and females is (2)where
is the expected coalescence time for a pair of genes, one sampled from sex j and the other from sex k. In general,
will be the smallest of the three coalescence times because times are shortest for genes on Y chromosomes, and so the second of the two terms on the right can be much larger than one. The conclusion is not surprising: FST for genes sampled from males and females can be very much smaller than FST for genes sampled from X and Y chromosomes.
Simple approximations for when X–Y divergence is small
While the analytic results for the full model are useful for quantitative predictions, they are far too complex to give simple insights. There is, however, a limiting case that greatly simplifies the results: when recombination between locus A and the SDR is strong relative to sex-antagonistic selection. In this section we develop an approximation for this case. It will be most accurate when FST between X and Y chromosomes caused by SA selection is small.
As recombination between the SDR and locus A increases, a neutral site close to locus A moves between X and Y chromosomes much more frequently than it moves between chromosomes carrying different alleles at locus A. This leads to an approximation based on a separation of timescales developed by Nordborg (1997). The genetic background for a gene at site i is determined only by the allele carried by its chromosome at locus A. Consequently, a pair of genes at site i exists in one of only three configurations: both genes linked to allele A1, both linked to allele A2, or one gene linked to A1 and the other to A2.
The results are particularly simple when there are no sex differences in recombination (). Assume further that allele frequencies at locus A are symmetric such that the frequency of allele A1 on X chromosomes equals the frequency of A2 on Y chromosomes (that is, x = 1 – y). File S1 shows that the expected times for genes sampled from chromosomes carrying different combinations of alleles at the sex-antagonistic locus A, regardless of their sex chromosome, are
(3)
Coalescence times for pairs of genes sampled from the same genetic background (that is, linked either to allele A1 or to allele A2) decline toward 1 as allele frequencies at locus A become more similar on X and Y chromosomes
Now imagine that we sample pairs of genes, knowing their sex chromosomes but not knowing to which allele at locus A they are linked. That is the situation when divergence between X and Y chromosomes is used to scan for loci under sex-antagonistic selection. The expected coalescence time for a pair of genes sampled from X chromosomes is found by averaging over the probabilities that the samples carry an A1 or an A2 allele: Analogous calculations give
and
The resulting expressions are complicated. We can, however, find a surprisingly simple approximation in the vicinity of locus A. Writing
for the average frequency of allele A1 on X and Y chromosomes, we find that
(4)which is accurate to leading order in (x – y) and
A measure of the width of the region of elevated divergence between X and Y chromosomes is given by the inverse of the derivative of the right side of (4) with respect to
evaluated at
= 0. The result is that the region of elevated FST is
(5)(in units of ρ) to the left and to the right of locus A. That approximation suggests that the width of this region is independent of the height of the peak in FST.
Figure 3 compares the simple linear approximation (Equations 4 and 5) with the full model. As expected, the approximation does very well when FST at locus A is small (Figure 3, left). As FST grows (Figure 3, center and right), the approximation continues to do well at predicting the width of the peak distal to the SDR (the slopes of the solid curves and shaded lines agree well to the right of the peaks). In the region between locus A and the SDR (to the left of the peaks), the approximation consistently underestimates the width. In sum, Equation 5 gives a conservative approximation to the size of the region with elevated FST.
Expected divergence between genes sampled from X and Y chromosomes in the vicinity of a locus under sex-antagonistic selection that is weakly linked to the SDR. The SA locus is at and the SDR lies far to the left (at
). Solid curves, the full model; lines with light shading, the linearized weak linkage approximation (Equation 4); horizontal bars with dark shading, the region of high FST defined by Equation 5. The parameter values for the three panels correspond to cases 1, 2, and 3 in Figure 2 but with
= 1000. Note the differences in the vertical scales of the panels.
Estimating the strength of sex-antagonistic selection
We have seen that the patterns of neutral diversity along recombining sex chromosomes depend strongly on the divergence between the X and the Y at the locus under sex-antagonistic selection. To this point we treated the allele frequencies at the SA locus (x and y) as fixed parameters. That approach gives results that apply to any type of selection, including frequency dependence. It is interesting, however, to consider what to expect if SA selection operates with constant (frequency-independent) fitnesses. We can then ask, Given the strengths of sex-antagonistic selection and recombination, what value of FST will result at locus A? Conversely, can we use FST to make inferences about the strength of SA selection? The following results give a partial answer by analyzing a special case.
The equilibrium values for x and y, and hence the value of FST at the selected locus, cannot be calculated analytically, but they can be found numerically. A model with constant fitnesses involves five parameters: four relative fitnesses (two in females and two in males) and the recombination rate between the SDR and locus A. To simplify things, we assume as before that selection is sex symmetric, but now allow for dominance effects. In females, the relative fitnesses of A1A1, A1A2, and A2A2 are 1 :: (1 – hs) :: (1 – s), while in males they are (1 – s) :: (1 – hs) :: 1. By iterating the recursion equations of Clark (1988) we found the equilibrium for x and y and then calculated FST at locus A, using Equation 1.
Figure 4 shows the equilibrium values of FST at the selected locus A plotted as a function of the ratio (s/rSA), which measures the strength of sex-antagonistic selection relative to recombination between the SDR and A. The key result is that over this range of parameters, FST is quite insensitive to the absolute strengths of selection and recombination: to a good approximation, it is only their ratio that matters. Furthermore, FST is also insensitive to h. (With larger values of h, polymorphism at locus A is lost and so sex-antagonistic selection does not operate.) Figure 4 is consistent with the three cases shown in Figure 2 and Figure 3, where (s/rSA) = 2, 4, and 8.
Divergence between X and Y chromosomes at the sex-antagonistic locus A plotted against the strength of sex-antagonistic selection (measured relative to the recombination rate between locus A and the SDR).
These results provide the beginnings of a framework for estimating the strength of SA selection from sequence data. The idea is to find peaks of FST between X and Y chromosomes from sequence data and then use Figure 4 to estimate the ratio (s/rSA). This will tend to give an underestimate for the strength of SA selection because (almost) all the polymorphisms in the data will be at linked neutral sites rather than the actual target of selection, locus A.
We illustrate the strategy using data from recent studies of the recombining sex chromosomes of the flowering plant Silene latifolia. Qiu et al. (2013) report that locus E352 is in the PAR and shows allele frequency differences between males and females. Four of 18 alleles sampled from females carried a 254-bp variant, while 22 of 30 alleles sampled from males carried the same variant. Assuming random mating, one can show (see File S1) that the maximum-likelihood estimate for the allele frequency on X chromosomes is and that on Y chromosomes is
which gives an estimated divergence of
(see Equation 1).
Now let us assume that locus E352 is itself the target of SA selection and further agree to the assumptions of Figure 4 (e.g., sex-symmetric selection). The data of Qiu et al. (2013) then imply that the relative strength of sex-antagonistic selection is (s/rSA) ≈ 8. Since it is likely that E352 is closely linked to but not the actual target of selection, that estimate may be conservative. We could go further with an estimate of rSA. Unfortunately, the data present a confusing picture here because there seems to be variation in recombination rates between families. Point estimates range from rSA = 0 to rSA = 15 cM (Bergero et al. 2013), giving estimates of s ranging from 0 to 1.
The point here is to show that we can make inferences about the strength of sex-antagonistic selection given appropriate models and data from recombining sex chromosomes. The analysis in this example has limitations: the model makes restrictive assumptions, the statistical analysis is incomplete, and the data set is not large. We anticipate, however, that substantial further progress will soon become possible on all of those fronts.
Patterns of linkage disequilibrium
Balancing selection on an autosomal locus generates linkage disequilibrium (LD) between linked neutral sites (Hudson and Kaplan 1988; Charlesworth et al. 1997). This suggests that patterns of LD near locus A might provide additional information about SA selection acting on that locus.
We explored this possibility using coalescent simulations to find R, the correlation between coalescence times between two neutral sites, which is directly related to measures of linkage disequilibrium (McVean 2002). We simulated a sample of two X chromosomes with 26 neutral sites, equally spaced by in the regions flanking locus A. We calculated R between all pairs of these sites, using 104 replicate simulations.
Figure 5 shows results for the parameters used in case 2 of Figure 2. Below the diagonal is the sex-antagonistic case. For comparison, above the diagonal are analogous results for simple balancing selection acting on an autosomal locus. Sex-antagonistic selection inflates the disequilibrium between tightly linked sites that lie between locus A and the SDR (the region bounded by the dashed box in Figure 5). This pattern is reminiscent of the increased FST seen between X and Y chromosomes in the same region (compare with Figure 2).
The correlation in coalescence times, R, between pairs of neutral sites near a balanced polymorphism. Above the diagonal are results for simple balancing selection on an autosome; below the diagonal are results for the sex-antagonistic case. Parameters are those for case 2 in Figure 2. The black circle shows the selected locus, and the dashed box shows the region of elevated linkage disequilibrium.
Young SA polymorphisms and SDRs
The results developed above describe equilibrium patterns of neutral diversity that are expected when both the SDR and the SA polymorphism are ancient. Those patterns might be expected in groups such as ratite birds and boid snakes, which have recombining sex chromosomes that may be on the order of 100 MY old (Vicoso et al. 2013a,b). Other sex chromosomes, however, are an order of magnitude younger [e.g., in several species of stickleback (Ross et al. 2009), medaka (Tanaka et al. 2007), white campion (Filatov 2005; Nicolas et al. 2005), and papaya (Wang et al. 2012)]. We therefore want to understand what patterns to expect when the sex chromosomes or the SA polymorphism are young and so neutral variation has not yet reached equilibrium.
We consider two situations. In the first, an SA polymorphism was recently established on ancient sex chromosomes. In the second, there is an ancient SA polymorphism on an autosomal locus that recently became sex linked. This can occur when a new SDR appears on an autosome, as happened in the medaka (Kondo et al. 2006), or when an autosome fuses with an ancestral sex chromosome, as happened in the Japan Sea stickleback (Kitano et al. 2009).
We studied both cases using simulations. We considered the patterns of neutral diversity expected when a new male-determining region was established on an autosome either 0.2N generations ago or 20N generations ago. We also used those two ages for the case of an SA polymorphism that recently became established at a PAR locus on an ancient sex chromosome. The following results are based on 105 replicate simulations for each combination of parameters.
Figure 6 shows patterns of neutral genetic variation near a locus under SA selection. Figure 6, left, is the scenario in which the SDR is young and the SA polymorphism is ancient. The most striking conclusion is that both mean coalescence times and FST are not much affected by the establishment of the new sex-determining region. That is because the pattern of neutral variation around locus A still largely reflects divergence that built up before the event.
Expected coalescence times and divergence between X and Y chromosomes following establishment of a new male-determining gene on an autosome (left) or new SA polymorphism at locus A (right). Solid curves show results for establishment events 20N generations ago and shaded curves show results for events 0.2N generations ago. Other parameters are as in case 2 of Figure 1. Circles are values from simulations (lines are added for clarity). Left, following the establishment of a neo-Y on the chromosome carrying locus A; right, following the establishment of a new sex-antagonistic polymorphism at locus A on an old sex chromosome. Top, dashed curves are for a pair of genes sampled from X and Y chromosomes and solid curves are when both genes are from X chromosomes.
The picture is very different, however, with a young SA polymorphism on an ancient pair of sex chromosomes (Figure 6, right). In this case, a mutation at locus A has recently swept from its initial frequency very near 0 to its equilibrium frequencies on the X and Y. This sweep depresses all of the mean coalescent times and flattens out the divergence between X and Y.
Discussion
Sex-antagonistic selection acting on a recombining sex chromosome will generate distinctive patterns of neutral genetic variation. If both the sex chromosomes and a locus A that is the target of SA selection are much more than N generations old, we expect to see a peak in the divergence between the X and Y chromosomes near A (Figure 2). The region around A will also show inflated nucleotide diversity among X chromosomes and among Y chromosomes (Figure 1). If the strength of SA selection is sufficiently strong to leave a detectable pattern, the entire region between the SA locus and the SDR will also show elevated FST between X and Y chromosomes. The linkage disequilibrium between tightly linked neutral sites in this region is expected to be greater than in the region on the other side of A (that is, distal to the SDR).
All of these patterns are the consequence of a simple effect that SA selection has: it reduces the effective rate of recombination between X and Y chromosomes in the region spanned by the SDR and locus A. An easy way to see this effect is to consider a copy of a male beneficial allele on a Y chromosome that recombines onto an X. The fitness of that chromosome is now lower than that of the average X, which is enriched for the female-beneficial allele. Consequently, the X with the male-beneficial allele will likely leave no descendants, and so evidence of the recombination event is lost. By reducing the effective recombination rate, divergence between the X and Y is enhanced. We can arrive at the same conclusion by considering the process from the coalescent perspective. When we sample one gene from an X chromosome and one from a Y, with sex-antagonistic selection they are likely also to differ with respect to the allele they are linked to at locus A. To coalesce, the genes have to reside on the same genetic background at both the SDR and locus A, and for that to happen at least two recombination events must occur. That increases the coalescence time and hence the divergence between X and Y in the region between the SDR and locus A. The reduction in the effective recombination rate between X and Y is strongest at neutral sites that are closely linked to locus A. Essentially the same mechanism produces a similar pattern of divergence in neutral polymorphisms near loci under local adaptation (Barton and Bengtsson 1986; Guerrero et al. 2012b).
SA selection in the PAR might have interesting evolutionary consequences for Y chromosomes. The effective recombination rate is decreased, which reduces the effective population size. That, in turn, could decrease the efficiency of selection and trigger degeneration of the Y (Bachtrog 2006; Bachtrog et al. 2011) even if recombination with the X is not fully repressed. We expect intuitively that this process might operate only in the region close to the SDR, but a full theoretical analysis is needed.
What patterns of polymorphism in the PAR would be diagnostic of sex-antagonistic selection? The patterns described here will result from any form of selection that generates stable allele frequency differences on X and Y chromosomes. SA selection is clearly one mechanism that can do that. Less intuitively, simple overdominance with no sex differences can do so as well (Otto et al. 2011). That is because males will tend to be heterozygous more often when X and Y chromosomes have different allele frequencies, and for some (but not all) parameter combinations that will cause the frequencies to diverge. Distinguishing between SA selection and overdominance therefore may require data other than patterns of neutral genetic variation.
A second caveat about using genomic data to detect SA selection is that the polymorphisms must be old enough for distinctive signatures to develop in the neutral variation around them. Even with strong sex-antagonistic selection, the equilibrium at a SA locus is not necessarily very stable and so SA polymorphisms can be lost by drift with relative ease (Connallon and Clark 2012). Thus SA selection may act frequently but be difficult to detect if the polymorphisms are often lost before their genomic signatures have emerged.
How much of the PAR might show evidence of SA selection? With just a single selected locus, appreciable divergence between the X and Y develops only if the strength of SA selection is greater than the rate of recombination between that locus and the SDR. For example, Figure 4 shows that with r = 1% a sex-specific selection coefficient of s = 0.03 is needed to generate an FST of ∼0.1. Then the region of X–Y divergence is confined to a small region of the PAR’s genetic map near the SDR. That region could be physically large and span many loci, however, if the recombination rate per megabase is low in Y chromosomes. That seems to be the case in some taxa: in hylid frogs the Y recombines with the X at a rate that is several orders of magnitude lower than the rate between two X chromosomes (Guerrero et al. 2012a), so almost all of the sex chromosomes consist of a PAR that is very tightly linked to the SDR. The size of the region showing substantial divergence between X and Y could also be large if there are multiple loci under SA selection in the PAR, since they can reinforce each other’s effects and facilitate polymorphism (Patten et al. 2010).
Several implications follow from the fact that SA selection can elevate divergence between the X and Y for the entire region between locus A and the SDR. First, it will increase the probability of detecting loci under SA selection. Conversely, the target of SA selection may be difficult to localize precisely because the peak of inflated FST is broad.
A third implication regards inferences about how sex chromosomes evolve. “Strata” of divergence between the X and Y were first discovered in mammals (Lahn and Page 1999) and later found in the sex chromosomes of birds (Handley et al. 2004), snakes (Matsubara et al. 2006), and flowering plants (Bergero et al. 2007). These are thought to correspond to segments of the Y chromosome that stopped recombining at different times, often through a series of chromosome inversions. Our results show that SA selection can cause regions of divergence to develop in the PAR. Looking at Figure 2, consider a scenario in which we have sequence data from X and Y chromosomes in the SDR (at ρ = 0), at a middle region between the SDR and the SA locus (for example, around ρ = 50), and at a region distal to the SA locus (say at ρ > 120). High FST will develop between X and Y chromosomes in the middle region. That pattern could be interpreted as evidence for a young stratum of the SDR that has not yet had much time to diverge when in fact it is in a PAR that continues to recombine. This kind of “pseudostratum” can appear when SA selection is sufficiently strong relative to recombination that there are substantial differences in allele frequencies between X and Y chromosomes (Figure 4). A pseudostratum may therefore be confined to the region that is tightly linked to the SDR, but could extend farther if there are several sites under SA selection in the PAR. One should therefore be cautious about defining the boundary between the SDR and the PAR, using only information about the divergence between X and Y chromosomes. Ideally, confirmation of a nonrecombining stratum would involve rejecting the hypothesis that the region in question is part of the PAR and X–Y divergence has resulted from tight linkage to the SDR, possibly magnified by SA selection.
Provided one is willing to assume a model for how selection acts, variation within and divergence between X and Y chromosomes can be used to estimate the strength of SA selection. We illustrated this idea using data on a single locus in the PAR of S. latifolia. More precise estimates should be possible when samples of phased sequences for X and Y chromosomes become available. This will allow one to fit coalescent models to observed patterns of neutral variation along continuous regions of the sex chromosomes rather than at a single locus, for example by using approximate Bayesian computation (Beaumont 2010). Executing that research strategy will require more complex models than those considered here. The amount of neutral polymorphism depends on mutation rates as well as the coalescent times that we have studied here (Wakeley 2008). In many taxonomic groups, males have higher mutation rates than females, which elevates the diversity at Y-linked sites relative to X-linked sites even when all else is equal (Miyata et al. 1987, 1990; Li et al. 2002), and so fitting the models to data requires accounting for sex-biased mutation. Another potential complication is that the PAR may often carry more than one locus under SA selection (Patten et al. 2010), which could generate more complex patterns of neutral variation within and divergence between sex chromosomes.
A final issue is the power that genomic data will have to estimate the strength of sex-antagonistic selection. The numbers shown here are of expected coalescence times. Patterns of polymorphism in real data will show substantial variation around those expectations as a result of the random nature of coalescence and mutation, which will tend to obscure the signals of SA selection. It is an open empirical question whether the net effects of selection, recombination, and stochastic variation will allow us to detect and quantify sexual antagonism acting on sex chromosomes.
Several groups of fishes seem particularly promising candidates for the analysis of SA selection using patterns of polymorphism on their recombining sex chromosomes. Loci for male coloration that are under sex-antagonistic selection lie in the PAR of poeciliid fishes (Fisher 1931; Lindholm and Breden 2002; Fernandez and Morris 2008; Gordon et al. 2012). Cichlids from Lake Malawi also have color polymorphisms that are both sexually dimorphic and sex linked, with the added interest that these loci may be driving the invasion of new sex chromosomes (Roberts et al. 2009; Ser et al. 2010). In stickleback, QTL for behavioral and morphological phenotypes involved with male courtship localize to a recombining neo-X chromosome (Kitano et al. 2009). Additional taxa will become ripe for these kinds of analyses with the rapid ascent of population genomics.
Acknowledgments
We are grateful to B. Barrett, B. Charlesworth, D. Charlesworth, A. Dagilis, K. Peichel, J. Wakeley, and two anonymous reviewers for discussion and comments on the manuscript. This work was supported by National Science Foundation grant DEB-0819901.
Footnotes
Communicating editor: D. Charlesworth
- Received August 2, 2013.
- Accepted February 26, 2014.
- Copyright © 2014 by the Genetics Society of America