## Abstract

We estimated the number of copies for the long terminal repeat (LTR) retrotransposable element *roo* in a set of long-standing *Drosophila melanogaster* mutation-accumulation full-sib lines and in two large laboratory populations maintained with effective population size ∼500, all of them derived from the same isogenic origin. Estimates were based on real-time quantitative PCR and *in situ* hybridization. Considering previous estimates of *roo* copy numbers obtained at earlier stages of the experiment, the results imply a strong acceleration of the insertion rate in the accumulation lines. The detected acceleration is consistent with a model where only one (maybe a few) of the ∼70 *roo* copies in the ancestral isogenic genome was active and each active copy caused new insertions with a relatively high rate (∼10^{−2}), with new inserts being active copies themselves. In the two laboratory populations, however, a stabilized copy number or no accelerated insertion was found. Our estimate of the average deleterious viability effects per accumulated insert [*E*(*s*) < 0.003] is too small to account for the latter finding, and we discuss the mechanisms that could contain copy number.

THE potential of transposable elements (TEs) to spread inside the host genome renders them into primarily selfish genetic material and, in the absence of forces containing their numbers, and assuming that new inserts are active transposable elements, an exponential increase in the number of copies and in the genomic insertion rate is expected (Charlesworth and Charlesworth 1983). Thus, TEs can play a relevant role in determining the genomic size of species. Furthermore, they can also provide an important substrate for the evolutionary process (Kidwell and Lisch 2001; Biémont *et al*. 2006) and, certainly, a relevant source of deleterious mutation.

In *Drosophila melanogaster*, ∼10% of the genome consists of TEs, with >1500 copies per gamete in the euchromatic part of the genome and ∼2000 in its heterochromatic part (Maside *et al*. 2001; Kaminker *et al*. 2002). The average rate of new insertions per element and generation was estimated to be ∼10^{−4}, while the rate of excision is two orders of magnitude smaller and does not seem to be a leading force in the dynamics of insert numbers in natural populations (Nuzhdin *et al*. 1997; Maside *et al*. 2000). Although it has been shown that new insertions have on average small deleterious effects (Houle and Nuzhdin 2004; Pasyukova *et al*. 2004), the distribution of such effects is unknown and a large fraction of new inserts might be neutral. Additionally, the insertion rate for different TE families varies widely between populations. Thus, the actual contribution of transposable activity to deleterious mutation remains to be ascertained.

Even for the extensively studied *D. melanogaster*, there is still some debate regarding the magnitude of the overall spontaneous deleterious mutation rate and that of the corresponding deleterious effects. Mutations with deleterious effects relevant in the short-to-medium term (say with deleterious effects *s* > 0.001) have been studied in a number of mutation-accumulation (MA) experiments, and they have been shown to usually occur at a relatively low rate, <5% per gamete and generation (García-Dorado *et al*. 2004). There is an obvious interest on the contribution of TE insertion to this genomic deleterious mutation rate, and it has been proposed that transposable elements account for a substantial fraction of the mutational deleterious input and may be responsible both for the diverse deleterious mutation rates estimated in different genetic backgrounds and for the accelerated mutational decay-of-fitness components observed in some MA experiments (Mukai 1969; García-Dorado and Caballero 2002; Fry 2004; Ávila *et al*. 2006). Furthermore, the deleterious effect of new inserts is of great interest, not only as a potentially important component of spontaneous deleterious mutation, but also as a likely factor containing the exponential increase of insert numbers in natural populations.

A set of mutation-accumulation lines and their controls (Ávila *et al*. 2006) was specially suitable for inquiring about the spontaneous TE insertion rate, its likely acceleration in the absence of selection, the corresponding deleterious effects of inserts, and their evolutionary dynamics in segregating populations. The initial study of these lines (MA1 lines in Figure 1; see materials and methods for details) revealed a per-gamete viability deleterious mutation rate <5% with average deleterious effect on the order of 10% (Fernández and López-Fanjul 1996; García-Dorado 1997; Chavarrías *et al*. 2001; Ávila and García-Dorado 2002; Caballero *et al*. 2002). In a later stage of mutation-accumulation (MA2 lines in Figure 1; see materials and methods), an accelerated viability decline was detected, which was ascribed to increased spontaneous mutation rates (Ávila *et al*. 2006). Although many transposable families showed no mobility in the MA1 lines, high transposition activity was detected for the long terminal repeat (LTR) retrotransposable *roo* family, which was due to a large insertion rate, with virtually no excision (Domínguez and Albornoz 1996; Maside *et al*. 2001; Vázquez 2006).

Here, we estimate the number of *roo* inserts in a set of 64 full-sib lines (MA2 lines; Ávila *et al*. 2006) and in two large control laboratory populations (C1 and C2) using the TaqMan assay of real-time quantitative PCR (QPCR) and/or *in situ* hybridization. QPCR was used to obtain the number of *roo* elements because this technique—as opposed to Southern-blot analysis (Maside *et al*. 2001)—allows accurate relative quantification of copy number. Additionally, and unlike *in situ* hybridization, it could be applied both to the MA2 lines (for which only frozen individuals were available) and to the control populations. As our interest was to detect mainly functional elements (*i.e*., those capable of mobilization) a sequence included in the LTR was used. Indeed, intact and identical LTRs are a necessary condition for an LTR element to be mobilized (Havecker et *al*. 2004). Moreover, LTRs are the most conserved sequences in the *roo* elements found in the euchromatic portion of the genome (Kaminker *et al*. 2002). *In situ* hybridization was used to estimate the absolute number of *roo* inserts in the Canton-S strain (used as calibrator in the QPCR) and in the C1 and C2 control populations. The similar results obtained in the control populations with both techniques would validate the novel QPCR approach and its use in the experimental lines. The corresponding estimates obtained from our evaluation of the number of *roo* inserts are discussed in light of previous information about the insertion rates of this element, mutational estimates, viability data, and equilibrium inferences.

## MATERIALS AND METHODS

#### Previous history of the populations and lines:

The design of the mutation-accumulation experiment analyzed is represented in Figure 1. In the first stage of the experiment (Santiago *et al*. 1992), a *D. melanogaster* line isogenic for all chromosomes, obtained by Caballero *et al*. (1991), was used as the base population for a large control population (C1) and for 200 full-sib mutation-accumulation lines (MA1 lines; see Chavarrías *et al*. 2001 for further details). At generation 265, one of those MA1 lines (line MA1-85), which had formerly shown good performance, was expanded to obtain a new large control (C2) and a new set of 150 full-sib MA2 lines, which were maintained synchronously to the control C1 of the previous experiment (Ávila *et al*. 2006). By generation 100, flies from the surviving MA2 lines and from the C2 control were frozen at −80°, and MA2 lines were discontinued because they became very difficult to maintain in the laboratory. Both control populations were maintained in 25 bottles (250 ml with 50 ml medium added per bottle) with ∼100 potential parents per bottle (8 bottles for C1 up to generation 200), using a circular mating scheme to ensure a large population size. Using lethal complementation analysis, the effective size of these populations was estimated to be ∼500 (García-Dorado *et al*. 2007).

#### Real-time quantitative PCR:

In the experimental lines, the number of *roo* elements was analyzed in (i) 64 MA2 lines (1 sample per line consisting of 12–14 males), (ii) C1 at generation 411 (2 male and 3 female samples, each consisting of 30 individuals), (iii) C2 at generation 100 (a single sample consisting of 30 males), and (iv) C2 at generation 146 (3 male samples and 1 female sample, each consisting of 30 individuals). A single sample of the Canton-S strain (40 individuals) was also used. Genomic DNA from each sample was isolated using the DNeasy tissue kit (QIAGEN, Valencia, CA), including a step of RNase A treatment. The concentration of extracted DNAs was spectrophotometrically quantified.

The TaqMan assay of the real-time PCR detection technique was used to quantify the number of *roo* LTRs, using the comparative method for relative quantitation (Applied Biosystems, Foster City, CA). This method requires using an endogenous gene (reference gene) with amplification efficiency almost identical to that of the target gene. Moreover, amplicons for target and reference genes must be designed and optimized to obtain amplification efficiencies close to one (Heid *et al*. 1996; Livak and Schmittgen 2001; Applied Biosystems user bulletin no. 2 at http://www.appliedbiosystems.com). We used the *RpL32* gene as the reference gene. Preliminary experiments using 5 μl of a fourfold dilution series ranging from 10 to 0.039 ng of DNA/μl were performed with the Canton-S strain to establish the optimal experimental conditions (amplification efficiencies close to one for target and reference sequences). The Canton-S strain was also used as the calibrator to obtain the relative amount of *roo* LTRs. Threshold cycle (*C*_{t}) values (*i.e*., the fractional cycle number at which the amount of amplified sequence reaches the threshold; Applied Biosystems) were obtained for both the target and the reference sequences in problem samples as well as in the calibrator. The amount of target sequence (*roo* LTR), normalized to an endogenous reference (*RpL32*) and relative to a calibrator (the Canton-S sample), can be obtained from the difference between the Δ*C*_{t} [= *C*_{t}(target) − *C*_{t}(reference)] values for the problem sample and the calibrator (ΔΔ*C*_{t}) according to the expression

PCR primers and TaqMan probes were designed using the Primer Express software (Applied Biosystems). Primers and probes for *roo* LTR and for the *RpL32* gene were designed using the GenBank accession nos. AY180917 and AE003772, respectively. Sequences of the primers and TaqMan probes used are shown in Table 1. PCR amplifications were performed in 96-well reaction plates, using separate wells for detecting the *roo* LTR and *RpL32* sequences, and including the calibrator in each plate. For MA2 lines, two replicates were obtained for each amplicon and sample, whereas for the control populations (C1 and C2) and the Canton-S strain multiple replicates were obtained per amplicon and sample. The reaction mixture consisted of 5 μl of DNA (3.125 ng) and 20 μl of TaqMan Universal PCR Master Mix (Applied Biosystem), including primers to a final concentration of 500 nm each, and the TaqMan probe to a final concentration of 200 nm. PCR amplification was performed in an ABI Prism 7700 instrument using the following amplification conditions: 10 min at 95°, followed by 40 cycles of 15 sec at 95° and 1 min at 60°.

#### QPCR evaluation of *roo* copy number:

For each QPCR evaluation, a ΔΔ*C*_{t} value was computed by reference to the Canton-S line as ΔΔ*C*_{t} = Δ*C*_{t}(sample) − Δ*C*_{t}(Canton-S), where Δ*C*_{t}(Canton-S) was the Δ*C*_{t} value averaged over the two Canton-S evaluations in the same plate. This procedure was intended to remove plate effects. In fact, plate effects on ΔΔ*C*_{t} were nonsignificant (*P* < 0.24 for males and *P* < 0.5 for females in two-way “sample × plate” ANOVA analyses), so that ΔΔ*C*_{t} values from different plates were pooled in further analyses.

To estimate the absolute number of *roo*-element copies in the experimental lines (MA2) and control populations from QPCR results, the number of copies in the calibrator Canton-S strain needs to be obtained. In the PCR amplification method, the initial number of target molecules (*X*_{o}) can be computed as *X*_{o} = (*X*_{t}/*R*_{t})*R*_{o} where *R*_{o} is the initial number of reference molecules and *X*_{t} and *R _{t}* are the numbers of target and reference molecules at the threshold cycle. Assuming the latter two values are equal (

*i.e*.,

*X*

_{t}/

*R*

_{t}= 1), and knowing that the reference sequence (

*RpL32*) is a single-copy gene (

*i.e*.,

*R*

_{o}= 1 copy per haploid genome), the above expression simplifies to

*X*

_{o}= . The number of

*roo*-LTR copies per haploid genome (

*i.e*., copies of target sequence) in the Canton-S strain would be = 254.97. Considering two LTR sequences per

*roo*element, this results in 127.49 ± 4.37

*roo*elements per genome in the Canton-S strain, which is in close agreement with the number of

*roo*hybridization signals detected per nucleus in this strain (124 ± 0.26, see below). Therefore, acording to the comparative QPCR method, the number of

*roo*elements per gamete (

*n*) in the experimental lines by reference to the Canton-S strain was estimated as(1)

Since *n* refers to the number of *roo* elements per haploid genome (*i.e*., one set of autosomes and one *X* chromosome), ΔΔ*C*_{t} values obtained from male samples were adjusted by adding the logarithm to base 2 of the expected male to female ratio of *roo* elements [*i.e*., by adding log_{2}(0.9); see *Preliminary information* in the results]. Adjusted ΔΔ*C*_{t} values were used in Equation 1.

In the case of MA2 lines, where the ΔΔ*C*_{t} replicates are obtained from a single sample, the between-line component of variance might include a fraction due to sample effects. Genetic sample effects should not be a concern for the analysis of MA lines, due to the small genetic variability expected within these full-sib lines and to the number of flies included in each sample (12–14 for MA lines); and the latter argument also applies to C1 and C2 samples, each including 30 flies. However, to rule out any relevant experimental sample effect, we performed ANOVA with samples as random effect within assays using the ΔΔ*C*_{t} data for the control populations (C1 assayed at generation 411 with 25 ΔΔ*C*_{t} values from 5 samples, C2 assayed at generation 100 with 8 ΔΔ*C*_{t} values from a single sample, and C2 assayed at generation 146 with 19 ΔΔ*C*_{t} values from 4 samples). This ANOVA gave a nonsignificant (*P* < 0.44) component of between-sample variance, which amounted just to 0.18% of the residual variance. A similar ANOVA performed on the *roo*-number estimates instead of on ΔΔ*C*_{t} values gave a very small between-sample component of variance (0.75). Therefore, sample effects were ignored and values from different samples of the same assay were pooled for analysis.

*In situ* hybridization analysis:

Polytene chromosome preparations for *in situ* hybridization (Montgomery *et al*. 1987) were obtained from 6 individual larvae of the calibrator strain (Canton-S) and from 10 individual larvae of each of the control populations (C1 at generation 420 and C2 at generation 155). Larvae were grown at 17° on standard cornmeal medium under uncrowded conditions.

The probe was prepared by PCR amplifying most of the LTR of the *roo* element (389 of 429 bp) using Canton-S DNA. The primer sequences (5′–3′) were forward primer, ATTTTGGGCTCCGTTCATA, and reverse primer, GTAAAATCCCAAATGAGAAGA. The PCR product was gel purified and subsequently labeled with 16-biotin-dUTP by nick translation. This probe includes the complete amplicon sequence used in the QPCR experiments. Prehybridization, hybridization, and detection conditions were as described in Segarra and Aguadé (1992). A Zeiss microscope with a Leica DFC camera was used to select on average 10 nuclei per slide and to capture the corresponding images for their subsequent analysis with the Corel Photopaint program. For each individual larva, the number of hybridization signals per chromosome was obtained upon confirming their presence and cytological location in 8–10 nuclei. The total number of hybridization signals per larva was computed by adding up the counts per chromosome.

#### Statistical analysis:

##### Statistical analysis of in situ data:

The sample size of the control populations precludes establishing which hybridization signals (hereafter bands, denoted *b*) correspond to segregating or fixed insertions. Furthermore, our estimate of the number of bands cannot be directly compared to previous *in situ* estimates of the number of ancestral bands obtained for the MA1 lines by others (Maside *et al*. 2001; Vázquez 2006), because these authors used an internal probe instead of an LTR probe. For this reason, our inferences on the rate of insertion accumulation in the control populations (λ_{c}) and on the corresponding number of ancestral *in situ* bands for our LTR probe are based on the estimates of the variance of the number of bands. The procedure is described in appendix a. Five hundred bootstrapped samples were used to obtain bootstrap errors and percentile confidence intervals for the variance of the number of *in situ* bands and to perform bootstrap tests and derive errors for the estimates.

##### Statistical analysis of QPCR data:

Even if the distribution of errors associated with individual ΔΔ*C*_{t} assays was normal, it is not reasonable, *a priori*, to assume normality for its exponential function *n* (Equation 1), unless ΔΔ*C*_{t} values are always very small. This raises doubts about the reliability of parametric analysis for this variable. To prevent bias, both standard parametric and nonparametric statistical analyses were performed.

##### Parametric analysis:

This consisted of (a) a one-way ANOVA for *n* values computed from individual ΔΔ*C*_{t} observations in the control populations, with “assay” as a fixed factor (assays C1-411, C2-100, and C2-146, respectively, where the number after the hyphen is the generation number), and (b) a one-way ANOVA for *n* values computed for individual ΔΔ*C*_{t} observations from 64 MA2 lines, with “line” as a random factor and two observations per line.

If the insertion rate does not depend on copy number and remains constant during the period considered, the number of inserts accumulated per MA2 line along this period is expected to be Poisson distributed, and the between-line component of variance (*n*) expected from ANOVA equals the expected number of insertions accumulated per line. Thus, (*n*) > *m* (where *m* is the average number of inserts accumulated per line) would suggest accelerated insertion rate in MA2.

Average *n* values, with their standard errors, were obtained for the two control populations and for the set of MA2 lines. These were used to estimate the differences in average *n* between C1 and C2 and also the rates of *roo* insertion in MA2 up to generation 100 by reference to control C2, together with their corresponding standard errors.

##### Nonparametric analysis:

A single *n* estimate was computed for each control sample and for each MA2 line using the corresponding average ΔΔ*C*_{t} value in Equation 1. This should estimate the true average *roo* number better than the average of *n* estimates obtained from individual ΔΔ*C*_{t} values, but it has no empirical standard error. Therefore, for the control populations, bootstrap error (BE) and bootstrap confidence intervals (BCI) based on bootstrap percentiles were computed using 500 bootstrapped *n* values, each estimated from Equation 1 using the ΔΔ*C*_{t} average value of a bootstrapped sample. For the set of 64 MA2 lines, 500 estimates of the average *n* were also obtained, each being equal to the average of the *n* values for a bootstrapped sample of 64 lines, and BE and BCI were also obtained. The 500 bootstrapped estimates for *n* in MA2 and C2-100 (or for C2-146 and C1-411) were randomly paired and used to estimate BE and BCI for the estimates of the rates of *roo* insertion in MA2 up to generation 100 by reference to C2 (or for the differences in average *n* between C1 and C2). Bootstrap tests with α significance for pertinent hypotheses were performed on the basis of the appropriate α-percentile on 500 bootstrapped estimates.

Of particular importance for the discussion of the data was to compare the variance of the *n* values observed for the MA2 lines with that expected if the true number of new *roo* insertions per line was Poisson distributed. Due to the lack of normality for the sampling errors (ϵ) of the estimates of *n* and to their likely dependence on the true *n* values, ANOVA estimates of the between-line genetic component of the variance of *n* (see above) may not be wholly reliable. For this reason, we simulated data with resampled residuals to infer the variance that is to be expected for the estimates of *n* in the lines if the true number of new *roo* insertions is Poisson distributed with mean *m*. The procedure is explained in appendix b.

## RESULTS

#### Preliminary information:

We have investigated the distribution of *roo* elements in the *D. melanogaster* genome (release 5.1) using a BLAST search on FlyBase (http://flybase.bio.indiana.edu/) for our QPCR amplicon, consisting of a 65-nucleotide LTR sequence. In the euchromatin, 235 sequences exhibited a similarity ≥98%. Twenty-one of these sequences corresponded to solo LTRs, whereas the remaining 214 were associated by pairs at distances ranging from 1000 to 20,000 bp, indicating that they belonged to the same TE element (Rho *et al*. 2007). This would imply that *in situ* analysis using an internal probe (as in previous studies for MA1; Maside *et al*. 2001; Vázquez 2006) should detect bands in the release 5.1 genome, while our LTR probe would be expected to detect a number 19.6% larger (). On the other hand, in the whole genome (euchromatin and heterochromatin), 258 sequences exhibited a similarity ≥98%, indicating that our QPCR approach would have estimated *roo* copies. Therefore, similar *roo* numbers would be detected in the sequenced genome using *in situ* and QPCR approaches (128 *vs*. 129) as, using *in situ* hybridization, the upward bias due to the detection of LTR solos is canceled out by the downward bias due to heterochromatic copies going undetected.

The above consideration depends on the small number of *roo* elements found in the heterochromatin using our amplicon, and it should be noted that significant heterochromatic portions, known as repeat regions, remain to be aligned. However, TEs in these regions usually belong to clusters made of thousands of nested old fragments of different transposable elements (http://chervil.bio.indiana.edu:7092/annot/dmel-release4-notes.html#3.2update), where it is unlikely to have LTRs presenting the almost complete identity to our amplicon sequence required to be detected in the QPCR analysis. Therefore, the inclusion of these unaligned repeat regions would not likely render different QPCR estimates for *roo* numbers.

In the above search, 20.8% of euchromatic sequences matching the amplicon were located at chromosome *X*, and previous results for MA1 lines indicated that this chromosome harbored 22.2% of the ancestral MA1 *roo in situ* bands (Maside *et al*. 2001). Similarly, 21.3% of the bands that we detected for the C1 and C2 populations corresponded to chromosome *X*, and the number of *roo* elements estimated by QPCR in males for the same two controls (using nonadjusted ΔΔ*C*_{t} values for males in Equation 1) was 92% that for females. From these observations, the average male-to-female ratio of *roo* elements would equal 0.9, which was used to estimate *roo* numbers per haploid genome (see materials and methods).

#### Averages for the observed number of *in situ* bands and QPCR *roo* numbers:

The average number of *in situ* bands per individual and the QPCR estimates of *roo* numbers (based for each population or MA2 line on the corresponding ΔΔ*C*_{t} average) are given in Table 2. These QPCR numbers differ from the average of *n* values on the basis of single ΔΔ*C*_{t} observations by <0.5%, and their bootstrap errors differ from the standard error of average *n* values by <5% (results not shown).

The remarkable similarity between *in situ* and QPCR estimates for assays of the same population is in agreement with the above results from the genome search, indicating that the actual biases from these estimates tend to cancel each other out and supporting the joint discussion of our *in situ* and QPCR estimates. Anyway, no estimate will be obtained from direct comparison between *in situ* and QPCR values.

It should be noted that the difference in average *roo* number between C1 and C2 (or MA2) depends upon the unknown number of inserts that the MA1-85 line had accumulated when it was used to derive the MA2 lines and the C2 population, as well as on any differences in the insertion rates. Furthermore, even bands common to all 10 individuals assayed in a given population need not have been present in the corresponding ancestral genome, which precludes identifying ancestral bands. Therefore, average *roo* numbers do not provide direct estimates for the overall insertion rates. However, it is remarkable that the difference in the average number of bands between C1 and C2 (7) was very similar to the corresponding difference in *n* QPCR values, supporting again the joint discussion of *in situ* and QPCR estimates.

The *n* QPCR values allow us to estimate the excess in average per-generation *roo-*insertion rate for MA2 lines compared to C2 up to generation *t* = 100 as (*n*_{MA2} − *n*_{C2})/100. This amounts to λ_{MA2|C2} = 0.104 ± 0.036. Using bootstrap, the 5% low percentile for this estimate is 0.052, indicating that the insertion rate for MA2 lines is significantly larger than for MA1 lines (λ_{MA1} = 0.031 ± 0.003 by generation 262; Maside *et al*. 2001). QPCR results also indicate that the average number of inserts for the C2 population was roughly stable by the end of the experiment.

#### Inferences based on estimates of the variance:

The variances of the per-chromosome and per-individual number of *in situ* bands for the C1 and C2 populations are given in Table 3. The variance is consistently larger for C2, despite the fact that this population was more recently founded from a nonsegregating origin than C1 (155 generations as opposed to 420). Using our estimates of the effective population size for C1 and C2 (*N*_{e} = 500; García-Dorado *et al*. 2007) and assuming (*n*) = σ^{2}(*b*)/2 in Equation A1, the estimates of the rate of insert accumulation (*i.e*., the rate of neutral insertion that would account for the variances observed in C1 and C2) were λ_{C1} = 0.009 and λ_{C2} = 0.047, with bootstrap errors 0.0029 and 0.0190, respectively, where λ_{C1} was significantly smaller than λ_{C2} (*P* < 0.05 in a bootstrap percentile test). These estimates indicate that the average number of inserts accumulated per haploid genome (*t*λ_{C}, which includes both segregating and fixed accumulated inserts) were 3.78 for C1 and 7.28 for C2 by the end of the experiment. Note that, since the expected number of segregating inserts, assumed Poisson distributed, should equal the variance for insert numbers [(*n*), 3.13 and 6.86 for C1 and C2, respectively], the expected number of fixed inserts (*n*_{f}) in each control population would be negligible even under neutrality. Estimates assuming (*n*) = 2σ^{2}(*b*)/3 (as if the populations were at the mutation-drift balance; see appendix a) were not qualitatively different from the former (λ_{C1} = 0.012, λ_{C2} = 0.063).

The parametric analysis of QPCR *roo* numbers in the MA2 lines gives an observed variance of the per-line average σ^{2}(*n*) = 130.2 and a significant (*P* < 0.0005) between-line component of the variance [note that these full-sib lines can be considered to lack genetic variability, so that should be ascribed to differences in the true number of inserts per line]. If new *roo* insertions would occur at a constant rate and were independent of previous insertion events in each line, the true number of new inserts accumulated per MA2 line should be Poisson distributed, with mean and variance *t*λ_{MA2}. Then, a value by generation 100 would imply one new insert per gamete and generation for the MA2 lines, an insertion rate exceedingly high compared to the estimates mentioned above (λ_{MA2|C2} = 0.104 ± 0.036; λ_{C1} = 0.009; λ_{C2} = 0.047), indicating that the rate of new insertions might be positively correlated with the number of previously accumulated inserts. Since this conclusion is based on a parametric analysis (ANOVA) whose statistical requirements (normality and homoscedasticity of residual errors) may have not been met, this result is checked below using simulation.

#### Joint inferences from the different analysis:

The main estimates from this joint analysis are presented in Table 4.

First, adding up the *in situ* estimate for the accumulation rate in C2 (λ_{C2} = 0.047) to the QPCR estimate of the average insertion rate for MA2 compared to C2 (λ_{MA2|C2} = 0.104), we obtain an average rate of insertion in MA2 lines that amounts to λ_{MA2} = 0.152 up to generation 100, which is five times the λ_{MA1} = 0.031 *roo* insertion rate reported by Maside *et al*. (2001) for MA1 lines at generation 262.

Second, introducing in Equation A2 our estimates for the average and variance of the number of bands (from Tables 2 and 3, respectively) together with the above estimates for λ_{c}, we can infer that the number of ancestral bands for populations C1 and C2 should be *n*_{0C1} = 75.7 and *n*_{0C2} = 83.9. The *n*_{0C1} estimate, considerably larger than the 63 ancestral bands obtained by Maside *et al*. (2001) and by Vázquez (2006), implies that our LTR probe hybridized at a number of genomic locations that are 20.2% larger than for the internal probe [(75.7 − 63)/63 = 0.202], in narrow agreement with the corresponding percentage estimated for the euchromatin in release 5.1 (19.6%; see above), despite the different genomes involved in both analyses. Furthermore, the difference between *n*_{0C2} and *n*_{0C1} would imply that the MA1-85 line had accumulated about eight inserts at generation 265, when it was used to generate the MA2 lines and the C2 control, thus showing an average insertion rate λ_{MA1-85} = 0.031. This value is in close agreement with the Maside *et al*. (2001) estimate for a sample of 16 MA1 lines at generation 262 (λ_{MA1} = 0.031 ± 0.003).

Finally, the expected variance of *n* values for the MA2 lines was obtained by simulation using the empirical error distribution and assuming that the number of newly accumulated inserts was Poisson distributed with average *m* = *t*λ_{MA2} ≈ 15. The value obtained, (*n*) = 29.0, was well below the observed value [σ^{2}(*n*) = 129.7, for *n* values computed from average ΔΔ*C*_{t} values]. This result corroborates that the insertion rate is not uniform across MA2 lines and is consistent with a positive association between the insertion rate in MA2 lines and the number of previously accumulated inserts, a phenomenon that would induce accelerated insertion. The empirical distribution of the QPCR *roo* values (*n*) in the MA2 lines is given in Figure 2, where the expected distribution if new insertions had not occurred and all the variance was due to experimental and sampling error is also given for comparison. Figure 2 illustrates that the large variance observed in the MA2 experiment is not due to a few rare lines with large *n* values and suggests that acceleration is a relatively common process in the genetic background of our lines. Acceleration could have also occurred during the MA1 experiment, being only moderate up to generation 262, when the observed distribution did not significantly depart from the Poisson one (with average 8.06 and variance 11.80 for the per-line insert number, as obtained from *in situ* analysis of 16 lines; Maside *et al*. 2001). However, at generation 301, the average insertion rate had significantly increased (*P* < 0.008), and the variance was twice the mean (11 lines analyzed, mean number of inserts 13.38, variance 26.09; Vázquez 2006). For control populations, however, the number of inserts per genome should be roughly Poisson distributed even under accelerated insertion, due to segregation and recombination, so that no insight concerning acceleration can be obtained from the magnitude of the variance for insert numbers.

## DISCUSSION

Our results show that (i) the *roo* insertion rate was clearly higher in our second mutation-accumulation experiment than in the first one, (ii) this rate experienced a strong acceleration in the mutation-accumulation lines during the second experiment (MA2), and (iii) the accumulation rate for each control population was considerably lower than in the corresponding set of MA lines, the number of inserts for C2 being roughly stable by the end of the experiment.

#### The process of insert accumulation in inbred lines:

Our inference that the average insertion rate for the MA1-85 line when it was used to derive MA2 and C2 (0.031) was equal to the average rate for the whole set of MA1 lines indicates that MA1-85 is representative of the MA1 experiment regarding *roo* accumulation. Thus, we tentatively consider MA2 lines as an extended MA1 experiment in this respect. Figure 3 represents average *n* values for the mutation-accumulation lines of this MA1–MA2 experiment, where *n* values from Maside *et al*. (2001) and Vázquez (2006) were adjusted to account for the larger number of bands obtained using an LTR probe (see results). The detected increase in the number of inserts with time suggests a continuously accelerated process.

Although it has been shown that transposable families have active and inactive copies in natural populations, the theory developed to account for the population dynamics of transposable elements usually considers that the overall genomic insertion rate is proportional to the overall number of element copies in the genome. Since natural selection has been relaxed in our MA lines to a good extent, we have investigated whether the observed acceleration fits a simple generalization of that current insertion model, where we consider that (i) not all ancestral inserts are active, (ii) insert number increases by a constant factor ν of the standing number (*x*) of active copies, and (iii) new inserts are active in mutation-accumulation lines. Since excision rates have been found in general to be two orders of magnitude lower than insertion rates, and since *roo* excisions were not observed after 262 generations in the MA1 lines (Nuzhdin *et al*. 1997; Maside *et al*. 2001), ν should virtually equal the insertion rate per active copy and generation, and henceforth the effect of excision in the dynamics of transposable elements will be ignored. In this situation, the ancestral number of inactive copies is (*n*_{0} *− x*_{0}), and the expected number of inserts per gamete in a given generation *t* is(2)

This model implies that, at generation *t*, the rate of new insertion per gamete and generation is l_{t} = ν*x _{t}*. From now on, we use l

_{t}to denote the gametic insertion rate at any given generation

*t*, while λ (with an optional subscript appropriate to the specific populations or lines) is the average of insertion rates (l

_{t}) up to a given generation. We have checked the fitting of the data to predictions from Equation 2 for different

*x*

_{0}integer values ranging from 1 to

*n*

_{0}. To achieve this, we denote by

*y*the increase up to generation

*t*for the natural logarithm of the number of active copies

*y*= log

_{e}(

*n*−

_{t}*n*

_{0}+

*x*

_{0}) − log

_{e}(

*x*

_{0}), and we compute

*y*from each available

*n*estimate for the MA1–MA2 lines. Then, since Equation 2 implies

_{t}*y*= ν

*t*, we check the model fitting for each

*x*

_{0}value by analyzing the linear regression, forced through the origin, of

*y*on generation number, where the regression slope ν gives the rate of new insertions per generation and active copy. For

*x*

_{0}values on the order of

*n*

_{0}(

*i.e*., most ancestral copies are active), the model predicts an acceleration that is too slow compared to the data. Predictions for

*n*obtained using different

*x*

_{0}and ν values in Equation 2 are shown in Figure 3. The best fitting corresponds to a single initial active insert (

*x*

_{0}= 1 in Equation 2) and gives ν = 0.0087 ± 0.0001, which, for

*x*

_{0}= 1, also estimates the initial insertion rate per gamete and generation [l

_{(t=0)}= ν = 0.0087]. The excellent fitting of this model (

*P*< 8 × 10

^{−6}, explaining a fraction

*R*

^{2}= 0.999 of the

*y*data dispersion) should not be taken at face value, since

*x*

_{0}was chosen

*ad hoc*. Notwithstanding, it shows that the results are remarkably consistent with a situation where the insertion rate per generation and gamete is proportional to the number of active inserts by a relatively large factor (∼0.009), the initial number of active inserts being very small and new inserts being active in mutation-accumulation lines. Our ν estimate implies an overall average per-copy insertion rate (0.0087/75 ≈ 0.0001) on the order of that previously estimated (∼10

^{−4}for Drosophila; Nuzhdin

*et al*. 1997; Maside

*et al*. 2000), but due to a small fraction of active elements with relatively high per-copy insertion rates (ν ≈ 10

^{−2}). This result is qualitatively robust against reasonable experimental or sampling errors. Thus, using a two standard error lower limit for the copy number in the reference Canton-S strain (which gives 92.8 instead of 99.7 for the average

*roo*copy number in MA2 lines), the model fits still reasonably well, giving

*x*

_{0}= 3 and ν = 0.0053 (

*P*< 0.0065,

*R*

^{2}= 0.974), supporting the same general view. For

*copia*elements, results obtained by Nuzhdin

*et al*. (1998) are also consistent with a possible small number of active copies, although, in that case, more than a single element was found to transpose (Perdue and Nuzhdin 2000).

We have estimated the average deleterious effect of new inserts [*E*(*s*)] from the regression slope (*b _{V}*

_{,n}) of chromosome II viability (

*V*, obtained by Ávila

*et al*. 2006 for MA2 lines at generation 98) on the QPCR number of insert copies here obtained for generation 100. To achieve this,

*b*

_{V}_{,n}was corrected to take into account the experimental error of

*n*estimates and for the fact that viability was assayed for chromosome II while

*n*was assayed for the whole genome [

*E*(

*s*) = −

*CKb*

_{V}_{,n}, where

*C =*130.2/100.2

*=*1.3 is the ratio of the variance of the

*n*estimate in the lines to that of the true

*n*values estimated from ANOVA, and

*K*= 2.35 is the ratio of insertions that occur in the whole genome to those in chromosome II estimated from Table 3]. The regression estimate was positive (

*b*

_{V}_{,n}

*=*0.0021 ± 0.0019), although it was not significantly different from zero. Thus,

*E*(

*s*) = −3.0

*b*

_{V}_{,n}= −0.0063, as if mutations were advantageous on the average. The largest average deleterious effect that would be consistent with this estimate in a one-tailed test (

*P*< 0.05) is 0.0032. This latter value can thus be taken as an upper bound for

*E*(

*s*), which is slightly smaller than previous point estimates for

*copia*new insertions in mutation-accumulation lines (Houle and Nuzhdin 2004; Pasyukova

*et al*. 2004). This small average deleterious effect per

*roo*insert, together with our values for the number of

*roo*inserts accumulated per line, would account for at most ∼6% of the viability decline detected in our MA lines at different generations (Fernández and López-Fanjul 1996; García-Dorado 1997; Chavarrías

*et al*. 2001; Caballero

*et al*. 2002; Ávila

*et al*. 2006). Thus, accelerated

*roo*insertion alone cannot explain the accelerated viability decline. Note, however, that although the average per-generation insertion rate for the mutation-accumulation lines through the whole MA1–MA2 experiment is λ

_{MA}= (99.7 − 75.7)/365 = 0.066, the per-generation insertion rate for the MA1–MA2 lines predicted by Equation 2 at the end of the experiment would be l

_{t=365}= 0.22,

*i.e*., >20-fold the ancestral rate, so that it could be involved in the final collapse of the lines. On the other hand, the same mechanism causing accelerated insertion for

*roo*could have also caused acceleration for some other active transposable families. For example, a previous analysis with RAPDs suggested mobility for

*Idefix*in our MA lines (Salgado

*et al*. 2005), a family that has not been included in any previous studies using this material.

Finally, considering an overall haploid insertion rate of ∼0.15 in the euchromatic part of the genome for the pool of TE families (Maside *et al*. 2001; Kaminker *et al*. 2002), our small estimate for the average deleterious effect for viability [*E*(*s*) < 0.0032] suggests that the rate of viability decline due to TE activity in the absence of selection (Δ*M*_{TE} = 0.15 × 0.0032 = 0.00048) makes a relatively small contribution (<18%) to the rate of viability decline estimated in MA experiments (∼0.0025; see García-Dorado *et al*. 2004).

#### The accumulation process in moderate-sized populations:

The rate of insertion accumulation in the control populations is clearly below the insertion rate in the corresponding MA lines. For C1 (λ_{C1} = 0.009), it coincides with the initial per-generation insertion rate for MA1 in the model fitted above (l_{t=0} = 0.0087). For C2, however, the average per-generation accumulation rate (λ_{C2} = 0.047) is smaller than the per-gamete insertion rate estimated for the MA1-85 line at generation 265, when this was used as the ancestral line for the second MA experiment (inferred from the model: l_{t=265} = ν*x _{t}*

_{=265}= 0.0087 × 8.2 = 0.071), and insert numbers were stable by the end of the experiment. This slowed accumulation can come from any mechanism inactivating new copies in segregating populations and/or from selection against new inserts reducing the number of active copies and consequently the gametic insertion rate, so that selection against new inserts can hardly be disentangled from reduced insertion rate per active copy.

Results from theoretical studies, assuming genomic insertion rates proportional to the overall number of inserts, have shown that natural selection can control copy abundance only under synergistic effects of inserts on fitness and that equilibrium would be possible only when each insert has a selection coefficient on the order of the per-element insertion rate (Charlesworth 1991). Using published estimates for the average insertion rate per copy element and generation (Nuzhdin *et al*. 1997; Maside *et al*. 2000), this would imply selection coefficients on the order of 10^{−4}, which would not be large enough to cause efficient purifying selection under the effective population size of our control populations (*i.e*., *s* > ∼2/*N*_{e} = ∼0.004). Moreover, our estimate from MA2 lines [*E*(*s*) < 0.003] implies that viability selection against new inserts should be inefficient in our control populations. Note that we estimated average deleterious effects using lines that had accumulated an extremely large between-line variance for the number of homozygous inserts, so that, in the presence of synergy, our bound estimate for *E*(*s*) should include very large synergistic effects. Therefore, our results exclude that synergistic selection on viability could account for the observed contention of insert number observed in the control populations.

On the other hand, it has been proposed that insert numbers can be stabilized by selection against chromosomal rearrangements caused by ectopic recombination between elements (Langley *et al*. 1988). This selection could synergistically increase with the number of heterozygous inserts (those that cannot pair properly), maximizing selection against new active copies, which are expected to segregate at low frequency, but to cause no load in case they became fixed. It should be noted that the fitness reduction from unequal exchange should occur through fertility impairment, so that it would go undetected in our viability tests. This regulatory mechanism would be inefficient in the MA lines, both because selective intensity would be small compared to drift and because of the small number of copies segregating in these brother × sister lines, unless the insertion rate is very large as occurred by the end of the experiment when the lines finally collapsed (predicted rate l = 0.22). However, it can account for the stabilization of insert numbers in our control populations. Note that our estimates predict an acceleration for the insertion that, in the absence of selection, would be much larger than that under a model where all ancestral inserts were active, with insertion rate 10^{−4} each. Therefore, the selection coefficient per active copy required to halt insert numbers (on the order of ν according to Charlesworth 1991, *i.e*., ∼10^{−2}) should be about two orders of magnitude larger than previously assumed, thus explaining insert number contention even in moderate-sized populations. However, due to synergy, this deleterious effect can be smaller when the number of copies, or the overall frequency of segregating inserts, are below the equilibrium value. Our results also imply that transposable activity for a specific line depends on the random sampling of active copies during line founding and is consistent with the observation that, despite the important copy numbers of many transposable families in natural populations, each studied laboratory line shows mobility for only a few families that vary from line to line (Nuzhdin *et al*. 1997).

Under the above model of selective contention, a prevalence of inactive copies could be due to the accumulation of spontaneous mutation in old elements, which are expected to drift in low-recombination regions sheltered from natural selection. This scenario is in agreement with results by Bartolomé and Maside (2004), who found that low-recombination regions harbor important numbers of fixed and degenerated transposable elements, while elements in high-recombination regions are fewer, canonical, and segregate at low frequencies. However, our inference that a large number of copies with well-conserved LTR sequences are inactive in the ancestral population would imply some inactivation mechanism, as might be RNA interference, which can silence genes at both the transcriptional level (through methylation; Kawasaki and Taira 2004) and the posttranscriptional level (Vastenhouw and Plasterk 2004; Brennecke *et al*. 2007). This mechanism has been proposed to cause specific TE silencing stimulated by the number of TE copies in a predator–prey-like model, leading to copy number regulation comparable to that arising from ectopic exchange (Abrusán and Krambeck 2006). In this situation, it is appealing to consider that particular TE copies could occasionally escape silencing through mutation, leading to insertion bursts. Although, in principle, this mechanism does not explain why the insertion rate was controlled in our C1 and C2 populations but not in the MA lines, it might explain the long-term inactivation of TE families in large populations.

#### General consequences:

Our results show a dramatic acceleration of the per-gamete insertion rate of *roo* elements for very small populations (*N* = 2), consistent with a very small number of active copies in the ancestral population, each causing the insertion of new active copies at a relatively large rate (ν ≈ 0.009). This situation should induce much higher acceleration than a scenario of many active inserts with small activity each, thus requiring a more powerful mechanism regulating insert number and accounting for deleterious effects on the order of 10^{−2} per active insert at the equilibrium, about two orders of magnitude larger than previously thought. In moderate-sized populations (*N*_{e} = 500) we observe insert accumulation, but it showed no acceleration for population C1 and finally halted for population C2. The estimated average homozygous deleterious effect for viability of new *roo* insertions is too small to account for this insert-number contention, even considering synergy. However, this contention could be due to unequal meiotic exchange causing synergistic selection for fertility. Our results indicate that most ancestral copies were not active, suggesting the operation of an inactivation mechanism like RNA interference.

## APPENDIX A

To infer the rate at which new inserts accumulate in control populations (λ_{c}) from the variance of band number, we assume that the inserts segregating in these populations are those that behave as neutral for the corresponding effective size *N*. Let *n* be the number of *roo* copies per gamete and assume that the number of new effectively neutral inserts that accumulate per gamete and generation is Poisson distributed with mean λ_{c}. The overall number of such new inserts (fixed or not) accumulated per gamete and generation after *t* generations is *t*λ_{c}. The variance of *roo* copy number per gamete at generation *t* ((*n*)) is due to segregating inserts and is expected to undergo a reduction (*n*)/2*N* from generation *t* to *t +*1 due to drift and an increase λ_{c} due to the Poisson-distributed new insertions. Thus, it can be shown that we expect (*n*) = λ_{c}, approximately. Therefore, we estimate(A1)This approach can induce some upward bias under accelerated insertion or if inserts have substantial recessive deleterious effects as, in those cases, the rate of fixation of new inserts can be smaller than expected from the number of segregating inserts. However, this bias will not be relevant until the accumulation period is so long that a substantial proportion of the neutral mutations is expected to be fixed in the population, *i.e*., when *t* ≫ . In our case, , and . Therefore, even assuming neutral inserts and constant insertion rates, most of the inserts accumulated per genome are expected to be still segregating in the population and, therefore, the above bias should be small.

For fixed inserts, the number of bands per individual equals the number of inserts per gamete, but for segregating ones one band is recorded for both homozygous and heterozygous individuals. Thus, the expected number of bands per individual corresponding to segregating inserts is *E*(*b*_{s}) = , where the sum is over the loci involved or possible insertion sites, and *q =* 1 *− p* is the frequency of a particular insert. Using diffusion theory, it can be shown that the expected value of *E*(*b*_{s}) at equilibrium for neutral mutations equals the number of segregating inserts per gamete (*n*_{s}). However, although our populations seem to be close to the equilibrium for deleterious mutations (García-Dorado *et al*. 2007), they are still far from the mutation–drift balance, which is roughly attained after 6*N*_{e} generations (note that *t < N* in C1 and C2). Therefore, we expect *q ≪ p*, so that *E*(*b*_{s}) should be about twice the number of segregating inserts per gamete. As this factor affects the probability of each band separately, the number of bands per individual corresponding to segregating inserts (*b*_{s}) can still be assumed to be Poisson distributed, so that the variance of band number at each generation [V(*b*)] should be affected by the same factor as the corresponding mean number of bands from segregating inserts [*E*(*b*_{s})]. Therefore, the range for the mean number of segregating inserts per gamete *E*(*n*_{s}) should be *E*(*b*_{s})/2 ≤ *E*(*n*_{s}) ≤ *E*(*b*_{s})2/3, and, since the variance for the overall number of bands is due to segregating inserts, the range for the variance of insert number (*n*) is σ^{2}(*b*)/2 ≤ (*n*) ≤ 2σ^{2}(*b*)/3.

Since our populations are still far from the mutation–drift balance, the estimate (*n*) = σ^{2}(*b*)/2 will be used in Equation A1, although some results assuming (*n*) = 2σ^{2}(*b*)/3 will also be given. Thus, assuming (*n*) = σ^{2}(*b*)/2 and *E*(*n*_{s}) = *E*(*b*_{s})/2, the average number of bands detected per individual in a control population should be *E*(*b*) = *n*_{0} + *E*(*n*_{f}) + 2*E*(*n*_{s}), where *n*_{0}, *n*_{f}, and *n*_{s}, stand for ancestral, nonancestral fixed, and segregating insert numbers for the population considered, respectively. Therefore, since *E*(*n*_{s}) = σ^{2}(*n*_{s}) = σ^{2}(*b*)/2 and *E*(*n*_{f}) + *E*(*n*_{s}) = *t*λ_{c}, the number of ancestral bands can be inferred as(A2)

## APPENDIX B

Residual errors for ΔΔ*C*_{t} within MA2 lines were corrected according to the degrees of freedom within each line, to recover the sampling error variance observed within MA2 lines (since there are only two observations per MA2 line, this requires multiplying residuals by a factor). Corrected residuals from different lines were averaged by pairs, to produce a sample of 64 sampling errors (ξ) for the average ΔΔ*C*_{t} of the MA2 lines. A sample of 64 estimates of *n* was simulated, each as(B1)where *P* is a random number from a Poisson distribution with average *m*, and *n*_{02} is our estimate of the ancestral copy number for the MA2 experiment. We computed the variance of these simulated *n*_{sim} values [σ^{2}(*n*_{sim})], which estimates the variance for the *n* values observed in the MA2 lines that would be expected under the Poisson distribution assumption, including effects from nonnormal sampling errors for *n* estimates and from their correlation with the true *roo* number of the line. A value of the observed variance in the real sample of MA2 lines (one *n* estimate per line based on the line ΔΔ*C*_{t} average) larger than σ^{2}(*n*_{sim}) would imply an accelerated insertion rate in the MA2 lines.

## Acknowledgments

We are grateful to Carlos López-Fanjul, Sergey Nuzhdin, Carmen Segarra, and to an anonymous reviewer for helpful discussion. This work was supported by Distinció per la Promoció de la Recerca Universitària from Generalitat de Catalunya (to M.A.) and by grants CGL2005-02412/BOS and BFU2004-02253 from Ministerio de Ciencia y Tecnología to A.G-D. and M.A., respectively.

## Footnotes

Communicating editor: M. J. Simmons

- Received May 18, 2007.
- Accepted July 17, 2007.

- Copyright © 2007 by the Genetics Society of America