Fatty acid bing protein 4 (FABP4) plays a key role in fat regulation in mammals and is a strong positional candidate gene for the FAT1 quantitative trait locus located on porcine chromosome 4. DNA resequencing of the FABP4 gene region in 23 pigs from 10 breeds and wild boar revealed 134 variants in 6.4 kb, representing a silent nucleotide diversity of πS = 0.01, much higher than reported so far in animal domestic species. Moreover, this diversity was highly structured. Also strikingly, the FABP4 phylogenetic tree did not show any geographical or breed origin clustering, with distant breeds sharing similar haplotypes and some of the most heterozygous samples pertaining to highly inbred animals like Iberian Guadyerbas (inbreeding coefficient ∼0.3) or British Tamworth. In contrast, the cytochrome b (mtDNA) phylogenetic tree was coherent with geographical origin. The estimated age of the most recent common ancestor for the most divergent Iberian or Tamworth haplotypes was much older than domestication. An additional panel of 100 pigs from 8 different breeds and wild boar from Spain, Tunisia, Sardinia, and Japan was genotyped for seven selected single nucleotide polymorphisms and shows that high variability at the porcine FABP4 is the rule rather than the exception. Pig populations, even highly inbred, can maintain high levels of variability for surprisingly long periods of time.

DOMESTICATION and modern breeding has shaped dramatically the genomic variability of both animal and plant species that humans currently employ to satisfy their needs. In particular, domestic animal populations are a challenging material for studying the footprint of selection because both natural and artificial selection overlap and because the species are made up of a complex web of semi-isolated populations (the breeds) that interchange genetic material occasionally. Many breeds, particularly commercial lines, have incorporated genetic material from other lines and the same breed is actually a mosaic of many different sublines. In pigs, for instance, it is well known that European breeds were crossed to Asian individuals to improve their prolificacy and docility. In this sense, the situation in pigs could be different from, e.g., dogs, where breeds consist mainly of closed breeding populations with a high degree of genetic differentiation (Parker et al. 2004).

The classical view considered domestication as a single event resulting in a strong bottleneck, followed by additional bottlenecks caused by breed formation and modern breeding (Clutton-Brock 1999). Although this simple model seems to fit some species like the dog (Lindblad-Toh et al. 2005) or maize (Wright et al. 2005), recent data suggest that domestication has been a much more complex and recurrent phenomenon than previously thought (Bruford et al. 2003). There is now widespread evidence of multiple domestication events in, e.g., pigs (Larson et al. 2005), cattle (Beja-Pereira et al. 2006), or chicken (Liu et al. 2006). Nevertheless, most nucleotide diversity studies in animals have analyzed mtDNA, which is inherited maternally and thus provides a partial story, or microsatellites, that have a very high mutation rate and are sparsely distributed along the genome. Since these phenomena are well known, it would be of interest to study in detail single nucleotide polymorphism (SNP) autosomal variability for livestock species to complete our view of domestication and modern breeding.

Although some works have been recently published on dogs (Fondon and Garner 2004), detailed information on autosomal DNA sequence variation in different breeds is lacking and the major effects of domestication and recent artificial selection on genome variation are still largely unknown. Certainly, regions that contain positional candidate genes are of particular scientific and economic interest. Here we report a comprehensive study of the fatty acid binding protein 4 (FABP4, also called Ap2 or A-FABP) in pigs, a strong positional candidate gene for the quantitative trait locus (QTL) FAT1, located in swine chromosome 4. FAT1 was the first QTL reported in any animal domestic species and was detected initially in a wild boar × Large White pig cross (Andersson et al. 1994). This QTL affects fatty acid deposition and growth and has been confirmed in numerous independent experiments involving many different breeds (Pérez-Enciso et al. 2000; Walling et al. 2000). Yet its precise molecular nature has remained elusive so far. Recently, we have shown in an Iberian × Landrace cross that the FAT1 region is made up of at least two loci, with FABP4 being a very good positional candidate for one of them (Mercade et al. 2005, 2006). In addition, several studies have demonstrated that FABP4 plays a critical role in fatty acid uptake and metabolism. In particular it has been shown that FABP4 is related to diabetes and obesity in both humans and mice and that mutations decreasing the level of FABP4 result in lower risk of coronary diseases and of type 2 diabetes (Tuncman et al. 2006).

Since modern breeding toward leaner animals must have dramatically affected the metabolism of fat deposition in pigs, it will be particularly illuminating to know the selection footprint left on genes involved in this process. A priori, one would expect a marked decrease in variability and a clear haplotype structure that would distinguish lean breeds from wild boar or autochthonous nonimproved breeds. Our data clearly refute this hypothesis in the case of the FABP4 gene and show that high nucleotide diversities can be maintained for long periods of time even in highly inbred animals.


Sequencing panel:

We resequenced a panel of 23 pigs pertaining to the following breeds: Iberian (IB, IB113, and IB114 from the retinto strain Villalón, southern Spain, and IB415 and IB421 from the Guadyerbas strain, a very fat and inbred strain); Porc Negre (PN, a very fat black breed, autochthonous from Balearic islands, n = 2); Tamworth (TW, UK endangered breed, n = 1); British Lop (BL, UK endangered breed, n = 1); Landrace (LD, hyperprolific line of French origin, n = 4); Large White [LW, LW419 pertains to an old very fat Large White line imported to Spain in 1931 and kept in a closed herd until 1992, when they were slaughtered (Rodrigáñez et al. 1998), and LW399 from a Spanish company]; Duroc (DU, Spain, n = 1); Meishan [MS, imported from China to the Institut National de la Recherche Agronomique (INRA), France, n = 1); Vietnamese pot belly (VT, VT104, from Madrid's Zoo and VT310 from a private farmer in Spain); wild boar (WB) from central and northeastern Spain (n = 5). One babirusa (BB, Babyrousa babyrussa) from Madrid's zoo was used as the outgroup.

Genotyping panel:

The genotyping panel was used to confirm the variability found in the sequenced animals, where a set of SNPs and microsatellites (described below) were genotyped. The panel consisted of wild boar (Spain, n = 5; Sardinia, Italy, n = 4; Tunisia, n = 4; and Japan, n = 5), Iberian (Guadyerbas, n = 6; Torbiscal, n = 5; Villalón, n = 2), Sicilian (n = 10), Pelón (Mexican hairless, n = 8), Piétrain (Spain, France, and Germany, n = 12), Meishan (INRA, France, n = 10), Large White (France, UK, Finland, Denmark, and The Netherlands, n = 10), Landrace (France, UK, Finland, Spain, The Netherlands, n = 10), and Duroc (United States, Canada, Denmark, n = 10). This panel comprises the main commercial breeds used worldwide (Piétrain, Duroc, Large, and Landrace), two Mediterranean autochthonous breeds (Iberian and Sicilian), one creole breed with likely Iberian origin [Mexican Pelón (Lemus-Flores et al. 2001)], one Asiatic breed (Meishan), as well as wild boar from its extreme geographical boundaries. This panel should thus be representative of the variability at the species level.


We resequenced ∼6.4 kb of the FABP4 gene region that comprises all four exons and three introns as well as the 5′ and 3′ flanking regions (Figure 1). The region was amplified in six PCRs and sequenced. Primers and conditions in supplemental Table 1 at http://www.genetics.org/supplemental/. The amplified products were sequenced using the BigDye Terminator v3.1 Ready Reaction Cycle sequencing kit in an ABI PRISM 3730 (Applied Biosystems, Foster City, CA). Approximately 900 bp of cytochrome b was sequenced as described (Clop et al. 2004). The sequences obtained were analyzed using the SeqScape v2.5 software (Applied Biosystems) with standard filter settings and manually edited and verified.

Figure 1.—

Scheme of FABP4 gene showing the region sequenced from ∼1300 to 7730 bp. The four exons (E1–E4) span positions 2379–2451, 5098–5269, 6128–6227, and 6702–6749 bp; a microsatellite (M) is in position 4951, and a SINE (S) is located at 5617–5914 bp. All positions are relative to accession Y16039.


To confirm the sequencing results in a wider sample, we selected seven SNPs: C2634del, A3159G, G4074A, T4091C, T4180A, G4205A, and T6252C (marked with numbers 1–4 in Figure 2). Indel 2634 was associated with fatness in an Iberian × Landrace cross (Mercade et al. 2006), while SNP 3159 was chosen because it segregates at intermediate frequencies. The last five SNPs (bp 4074–6252) were selected because they allow distinguishing clade A from B (explained below; Figure 3). Genotyping of the indel and C6252T polymorphisms was done as described (Mercade et al. 2006). For the A3172G polymorphism, a 112-bp-long fragment of intron 1 was amplified using FABP4A3172GF and FABP4A3172GR primers (supplemental Table 1 at http://www.genetics.org/supplemental/). PCRs were performed in a 25-μl final volume containing 1.5 mm MgCl2, 200 μm dNTPs, 500 nm of each primer, 45 ng DNA, and 0.6 unit Taq DNA polymerase (Eco Taq). Thermocycling was 95° for 3 min, 45 cycles of 95° for 1 min, 60° for 1 min, and 72° for 1.5 min, with a final extension of 72° for 5 min. The genotyping of this polymorphism was done in a PSQ HS 96 system (Pyrosequencing AB) using FABP4A3172GSeq2 as sequencing primer (supplemental Table 1 at http://www.genetics.org/supplemental/). The remaining polymorphisms were genotyped by resequencing using FABP4C4111TF and FABP4E1/E2R primers (supplemental Table 1 at http://www.genetics.org/supplemental/). PCRs were performed in a 25-μl final volume containing 1.5 mm MgCl2, 200 μm dNTPs, 500 nm of each primer, and 45 ng DNA and 0.6 unit AmpliTaq Gold (Applied Biosystem). Thermocycling was 95° for 10 min, 40 cycles of 94° for 30 sec, 60° for 1 min, and 72° for 1.5 min, with a final extension of 72° for 15 min.

Figure 2.—

Polymorphisms found by resequencing; haplotypes are arranged as in the NJ tree in Figure 3; note that haplotypes from the same individual are not necessarily contiguous (the last number in the sequence name refers to haplotype 1 or 2). The first sequence is the reference sequence (GenBank accession no. Y16039) while the second is the outgroup (Babyrousa babyrussa). Indels are denoted with a “−”; lowercase letters indicate that the phase could not be reliably determined and a “?” indicates that the nucleotide could not be ascertained because the individual was heterozygous for indels or the microsatellite. The microsatellite is located between SNPs 67 and 68 and is indicated by “M,” and the allele code (0–20) represents the difference in the number of repeats relative to babirusa. The only mutation in an exon is marked with an E (position 6094). The bottom line numbers the SNPs genotyped in the genotyping panel (Table 3). These SNPs were in positions 2634 (1), 3159 (2), 4074 (3), 4091 (3), 4180 (3), 4205 (3), and 6252 (4).

In addition, six microsatellites (FABP4, SW35, and SW317 on chromosome 4, SW1517 on chromosome 2, SW1701 on chromosome 7, and S0144 on chromosome 8) were genotyped. PCR reactions were carried out in an GeneAmp PCR system 9700 (Perkin-Elmer, Foster City, CA) and analyzed with fluorescent detection in an ABI PRISM 3730 Genetic Analyzer (Applied Biosystems). Genotypes were determined using the GeneScan 3.7 analysis software (Applied Biosystems).

Sequence analysis:

Phases were reconstructed with Phase v2.1.1 (Li and Stephens 2003) using default options except that the program was run five times and the last iteration was 10 times longer, as suggested by the authors. For estimating π and for tests based on genotype frequencies (Tajima's D and Fu and Li's D), all polymorphisms were considered. For phylogenetic, recombination, and linkage disequilibrium analyses, we considered only phases considered as reliable (P > 0.8) by Phase. Several parameters were estimated with DnaSP v4.10 (Rozas et al. 2003): nucleotide diversity (π), Tajimas's D, and Fu and Li's D using babirusa as outgroup. Phylogenetic sequence analyses were carried out with Kimura's two-parameter model and neighbor-joining (NJ) trees using Mega3; bootstrap was performed to test the phylogenies (Kumar et al. 2004). For completeness, phylogenetic relationships were also computed with the more parametric Bayesian approach as implemented in MrBayes3 with default parameters (Ronquist and Huelsenbeck 2003). Recombination rates (ρ = 4Ner) were estimated using coalescence-based techniques with LDhat (McVean et al. 2002). This method allows fitting a variable recombination rate along the sequence to infer the presence of recombination hotspots (or coldspots). Estimates were obtained with 2 million iterates and considering three block penalties of 5, 20, and 50. Linkage disequilibrium measures were obtained with DnaSP. The Haploview v3.31 program (Barrett et al. 2005) was used for graphical purposes and to determine the presence and length of haplotype blocks and of potential tag SNPs.

Genotype analysis:

For microsatellites, population structure was studied via the Wright's F indices, a widely used measure for this purpose. FST measures the reduction in heterozygosity in a subpopulation due to drift, while FIS measures the reduction caused by nonrandom mating within a subpopulation (Hartl and Clark 1997). Here we report Weir and Cockerham's estimates and Slatkin's RST index; the latter is equivalent to FST and is appropriate when a stepping-stone mutation model applies. The package FSTAT was employed (J. Goudet, http://www.unil.ch/izea/softwares/fstat.html).


Sequence analysis:

We detected 134 nucleotide polymorphisms within Sus, including 12 indels and 122 SNPs (Figure 2). All were located outside the exons except one synonymous mutation in exon 3. This polymorphism level corresponds to one variant every 50 bp and total and silent nucleotide diversity indices of π = 1.17% and πS = 1.20%, respectively (Table 1). Table 1 also presents statistics for some breeds separately (Iberian, Landrace, and wild boar). Note that the Iberian population alone had a nucleotide diversity that is half of the whole set of sequences, while Spanish wild boar represent the least polymorphic population, followed closely by Landrace. Table 1 also shows Tajima's D and Fu and Li's D using babirusa as outgroup. Under the neutral null model, the expected values of the D statistics are zero. Directional selection causes negative D-values while balancing selection, a positive value. Iberian parameters were quite different from wild boar, Landrace, or in general the whole porcine population studied.

View this table:

Polymorphism statistics in the sequencing panel

After reconstructing the gametic phases with Phase (Li and Stephens 2003), we obtained the NJ phylogenetic tree. This tree shows an arrangement in two main distinct clades (named A and B in Figure 3 for convenience) that do not bear a relation to geographical origin: some local breeds like Iberian (Spain), Tamworth (UK), and Meishan (China) harbored haplotypes in both clades. The Bayesian analysis yielded an almost identical clade structure (supplemental data at http://www.genetics.org/supplemental/). The only Tamworth animal sequenced, a breed considered endangered by the Rare Breed Survival Trust in Britain, was highly heterozygous, as was the only Meishan individual sequenced, imported from China by INRA (France) and maintained with low breeding numbers for several generations. The Iberian Guadyerbas strain was also highly polymorphic, containing haplotypes in both clades. In contrast to FABP4, the NJ tree obtained from resequencing of ∼660 bp of mitochondrial cytochrome b has a more clear geographical interpretation than that from the FABP4 sequence (Figure 4 and supplemental data at http://www.genetics.org/supplemental/ for the Bayesian result), as breeds of Asiatic origin cluster together, including some Large White, where Asiatic haplotypes are common due to interbreeding in the 19th century (Giuffra et al. 2000). Similarly, all Iberian pigs and Spanish wild boars had European haplotypes as has been reported previously (Alves et al. 2003; Clop et al. 2004.

Figure 3.—

NJ tree for the FABP4 gene. The last number (1 or 2) indicates the haplotype; note that Tamworth (TW372) and Meishan (MS405) have two highly divergent haplotypes, as have Guadyerbas IB415, Landrace LD400, and Large White LW399. Breeds with haplotypes in different clades are marked with symbols. Numbers at selected branches are the bootstrap values after 1000 resampling draws.

Figure 4.—

NJ tree obtained from cytochrome b sequence; letters refer to haplotypes described by Giuffra et al. (2000): E1 and E3 are European haplotypes and A1 and A2 are Asiatic. All sequences not shown (REST) fall within the E1 clade. Babirusa sequences (BB) obtained from GenBank are added with accession numbers shown.

Intriguingly, we did not find a clear relation between distance and significance of disequilibrium measures (Figure 5). To investigate this in more detail, we estimated scaled recombination rate ρ = 4Ner with LDhat (McVean et al. 2002). It was ρ = 9.2 for the whole region, or 1.44/kb. Interestingly, LDhat pointed out two distinct regions, a high-recombination region up to SNP 25 (position 3341) with ρ = 17 and an almost negligible recombination region thereafter (Figure 6). An additional hotspot could exist in the vicinity of the microsatellite, around SNPs 71–72, but it was detected only with small block penalty 10 (results not presented). This recombination hotspot is detected by the Haploview's algorithm as this region is not included in any haplotype block (Figure 7). The haplotype structure of this gene seems complex, containing four small haplotype blocks. Forty-four tag SNPs were required to capture all variability, and, according to the aggressive tagging algorithm with the default options from Haploview (Barrett et al. 2005), 10 SNPs allowed tagging of 70% of all SNPs.

Figure 5.—

Relation between physical distance and significance of linkage disequilibrium. The correlation coefficient between distance and P-value was 0.09.

Figure 6.—

Estimation of blocks with different recombination rates obtained from LDhat (McVean et al. 2002) and block penalty 20.

Genotypic data:

In agreement with sequencing results (Figure 2), almost complete disequilibrium was confirmed in the genotyping panel for the SNPs that separate clades A from B (Figure 3): G4074A, T4091C, T4180A, G4205A, and T6252C. Complete disequilibrium was found between SNPs 4091–4205, and only one recombinant haplotype was identified between SNPs 4091–4205 and 6252 (in the Sardinian wild boar), and three recombinants between position 4074 and the following SNPs (two in Meishan and one in Japanese wild boar). Table 2 confirms the high variability of Iberian, Meishan, and Japanese wild boars for SNPs that separate clade A vs. B. Only Mexican Pelón and Spanish wild boar had almost-fixed clade A haplotypes. Overall, the general frequency of clade A haplotypes was 70% (last row in Table 3), comparable to that found in the sequencing panel (Figures 2 and 3).

View this table:

Frequencies of the reference allele Y16039 for a selected set of polymorphisms in the genotyping panel

View this table:

F-indices obtained for the microsatellites typed in the genotyping panel

Table 3 shows the F-indices obtained from the microsatelltes. We reasoned that microsatellites at FABP4 should have fixation indices (FST) different from other microsatellites as it is unlikely that the whole porcine genome is as variable as reported here for FABP4. However, this expectation was not confirmed, and we found that FABP4's microsatellite had FST's comparable and not significantly different from other microsatellites (Table 3). The number of alleles was also similar to those found for other microsatellites. All FST's were significantly different from zero for all markers, which supports that porcine breeds are highly structured.


FABP4 is an extremely variable gene, even in highly inbred lines:

Nucleotide diversities found for this gene are higher than values reported in any domestic species so far, except in histocompatibility genes; e.g., they are double the average rate found in chicken (π = 0.65%) (Berlin and Ellegren 2004). Such high nucleotide diversities are typically found in species with high effective sizes like Drosophila, which have nucleotide diversities of ∼1.0% (D. melanogaster)–1.8% (D. simulans and D. pseudoobscura) (Moriyama and Powell 1996). But the results are even more surprising when we consider the fact that the NJ tree does not show a geographical arrangement (Figures 2 and 3) and that highly inbred populations like Tamworth or Iberian Guadyerbas turned out to be highly polymorphic for this gene. The case of the Iberian Guadyerbas strain is particularly striking. This is a fat, black hairless and highly inbred line (average F ∼0.35). It is representative of some of the old types of Iberian pigs and has been kept in isolation since 1945 (Toro et al. 2000). Yet it contains haplotypes in all clades of Figure 3. This high variability was confirmed by genotyping a larger and more diverse sample of pigs (Table 2). In parallel to results in Figure 3, no clear geographical pattern could be observed in Table 2, although the clade B was more frequent in Asian animals. It is interesting to note that porcine breeds do not seem overall less polymorphic than their wild ancestors, as has also been observed in chicken (Wong et al. 2004).

The average pairwise distance between babirusa and pig sequences was 0.051 ± 0.006 using a Kimura's two-parameter model. Using an estimated divergence time of 12–26 MY (Thompsen et al. 1996), this results in an estimated substitution rate of λ = 0.051/2 × (12–26) × 106 ∼1–2 10−9/bp/year, in agreement with typical results in mammals (Graur and Li 2000). The average distance within Sus was 0.01. In the only Tamworth sequenced, the nucleotide distance between both haplotypes was 0.028, and that between the most distant Iberian haplotypes, 0.033; in both cases they are close to the maximum distance within Sus, 0.036. These extreme Iberian and Tamworth lineages must have diverged ∼7–15 MYA. It can be concluded that a large part of the variability observed is certainly much older than domestication. Thus, it is not surprising that all lineages are found in both European and Asian populations since FABP4 polymorphisms predate domestication by far. The high and structured nucleotide diversity found between and within breeds strongly supports that domestication was not a single demographic event but rather the result of multiple processes, in agreement with previous results (Vila et al. 2005).

Sequence, mtDNA, and microsatellites tell different stories:

Most studies of genetic diversity are carried out with either mitochondrial DNA or microsatellites. But the results can be discordant, depending on the markers used. Note that the phylogenies of autosomal FABP4 and maternally inherited cytochrome b are rather different (Figures 3 and 4), emphasizing distinct male- and female-driven variabilities as observed in other species, e.g., the dog (Sundqvist et al. 2006). It is also important to note that the FABP4 microsatellite is uncorrelated to the clade structure seen in Figure 3. Ten alleles were identified within the sequencing panel (Figure 2), although no specific variant could be ascribed to any of the clades in Figure 3. In fact, the average correlation between the microsatellite and the 134 polymorphisms was 0.03, and 0.02 with SNP in position 6252, which separates clades A and B. Also the FST of FABP4's microsatellite was comparable to the rest of the microsatellites studied (Table 3), which one would expect to be different as it is unlikely that the high variability of the FABP4 gene is to be found throughout the porcine's genome.

Linkage disequilibrium:

An assessment of linkage disequilibrium is crucial to exploit current fine-mapping techniques based on association analysis. The current view in animal domestic species is that disequilibrium occurs over long ranges, even across centimorgans for dairy cattle (Farnir et al. 2000), sheep (McRae et al. 2002), or pig (Nsengimana et al. 2004; Jungerius et al. 2005). In a recent study of resequencing, Jungerius et al. (2005) observed that significant disequilibrium extended for ∼50 kb in three pig breeds (Landrace, Large White, and Meishan), implying that disequilibrium mapping could be based on a relatively sparse set of SNPs. The picture found here is more complex. First, the haplotype blocks were smaller, and second, a non-negligible fraction of the gene was not ascribed to any block (Figure 7). The Haploview algorithm suggests that one tag SNP would be required every ∼150 bp, unpractical for any genomewide association analysis, although 10 SNPs would tag 94 SNPs. These results suggest that the density of tagging SNPs can vary dramatically between species or at least that long-range disequilibrium should not be taken for granted in animal domestic species (Figure 7).

Figure 7.—

Linkage disequilibrium (r2) plot obtained with Haploview 3.31 (Barrett et al. 2005). Triangles with solid lines represent the five blocks identified. Arrow marks the span of the high recombination hotspot.

Evidence for selection:

The pattern and level of DNA polymorphism distribution reported at the FABP4 region is clearly unexpected for domestic breeds. The results are even more surprising when one considers that a highly significant QTL (P < 10−10) affecting fatness maps very close to the FABP4 region (Mercade et al. 2005). At first glance, a selective sweep promoted by a hitchhiking event acting on a beneficial mutation (i.e., directional selection) or a bottleneck event caused by domestication might generate the observed unusual haplotype structure with two characteristic clades (clades A and B, Figure 3). Indeed, selection for a favorable mutation affecting fat levels might have generated the current genetic structure via hitchhiking (Rozas et al. 2001; Quesada et al. 2003); the hitchhiking event might be incomplete (not all variation was removed due to recombination) or ongoing. The low nucleotide variability levels and the significant negative values of Tajima's D and Fu and Li's D on clade A would support the directional selection hypothesis (Tajima's D of −2.05, P < 0.05; Fu and Li's D of −2.40, P < 0.05).

The analysis just presented, however, does not consider that some breeds share haplotypes from the most distant clades and that there does not seem to exist intermediate haplotypes between clades A and B (Figures 2 and 3). When we analyze each population individually (those with n > 4, Table 1), only the Landrace population seems to be under directional selection. Since the number of substitutions in clade A is not highly reduced, the putative selective event should be old, predating the time of domestication. Balancing selection, in contrast, seems to fit the data well in the Iberian population. The segregating haplotypes were very old, the most recent common ancestor of the most distant Iberian haplotypes being ∼7–15 MY old. Although these ages are based on a simplistic application of the molecular clock, and thus these estimates are likely biased (Ho and Larson 2006), no plausible range of substitution or mutation rates suggests that the high divergence observed could have appeared after domestication. Other indirect evidence of balancing selection can be obtained from the estimate of effective size. Under equilibrium, θ = 4Neμ = E(π) = 0.0059 (Table 1); therefore, using μ ∼ λ ∼ 10−9 (see above), then Ne ∼ 106. This estimate should nevertheless be taken with caution because domestication and bottlenecks result in populations that are not in equilibrium. But even assuming some overestimation due to a violation of assumptions, this effective size is very high, considering the small number of known breeding animals of the Iberian breed with a few elite herds providing most of the boars; this breed, in addition, suffered a strong bottleneck after 1960, when foreign breeds began to be imported and replaced the traditional breed (Dobao et al. 1988), not to mention the particularly highly inbred Guadyerbas line (Toro et al. 2000). This suggests that some sort of balancing selection has maintained the observed levels of variability. The SNP genotype data on the genotyping panel do suggest that high variability is the rule rather than the exception (Table 2); for instance, if we take SNP T6252C as marker for clade A vs. B, we find that only the Mexican Pelón and Spanish wild boars do not show variation, while Iberian, Meishan, and Japanese wild boars were the most variable populations. Nevertheless, the fact that only intronic and synonymous variability was found suggests that there are restrictions to the protein sequence and that balancing selection, if it exists, may not target the gene product itself.

Finally, admixture could also have played a role in the gene structure observed. For instance, note that SNP frequencies in Sardinian and Tunisian wild boars were similar (Table 2). It is known that wild boar were introduced only after the seventh millennium bc in Sardinia (Larson et al. 2005), and it is highly unlikely that Tunisian wild boar have crossed to their domestic counterparts. Then, the similarity between lines would suggest that there might have been interchanges between North Africa and the Mediterranean populations at some point in history, may be similar to that reported recently in cattle (Beja-Pereira et al. 2006). Occasionally, male wild boar may have mated female domestic pigs, distorting the history of mitochondrial vs. autosomal loci. But this genetic flow is unlikely to be large (Fang et al. 2006). Further, other intercrosses seem highly unlikely, like that between Iberian Guadyerbas and Chinese Meishan, which nevertheless share some long haplotypes, or between Japanese and Western wild boars. Nevertheless, genetic interchange between populations cannot satisfactorily explain the global high nucleotide diversity. Therefore, the observed pattern of nucleotide variation would have been generated by a complex interplay of directional or balancing selection together with a number of bottleneck events.


This is, to our knowledge, the first comprehensive analysis of an autosomal porcine gene where both the wild ancestor and different domestic breeds have been analyzed. At present, we still do not know how general our results are, but they do challenge several currently held views about genetic variability in domestic animals: (i) domestic animals can maintain surprisingly high levels of variability for long periods of time; (ii) there is not necessarily a correlation between genetic and geographic differentiation; and (iii) the presence of uninterrupted long haplotype blocks is not guaranteed.


We thank Anna Mercadé and Jordi Estellé for help with genotyping and Maribel Merchán for help with sequence editing. We are extremely grateful to all persons and institutions who generously provided samples: L. Silió, M. C. Rodríguez (Instituto Nacional de Investigaciones Agrarias, Madrid), C. Talavera, E. Martínez (Madrid's zoo), M. Cumbreras (Diputación de Huelva), J. Jaume (Institut de Biologia Animal de Balears), Erik v Eckhardt (Sweden), N. Okumura (STAFF Institute, Japan), C. Lemús (Universidad Autónoma de Nayarit, Mexico), R. Alonso (Universidad Nacional Autónoma de México, Mexico), J. Reixach (Batallé, Spain), E. González (Seporsa, Spain), O. Vidal (Universitat de Girona), A. Angiolillo, S. Casu (Italy), C. Renard, J. P. Bidanel (INRA, France), P. Conde (Kubus, Spain), J. L. Noguera (Institut de Recerca i Tecnologia Agroalimentàries, Spain), J. A. Barbosa (Cooperativa Guissona), Dehesón del Encinar (Junta de Castilla-La Mancha), and Copaga. We bought semen from Semen Cardona, Semen Porcino Andalucía (Spain), and Deerpark pigs (Northern Ireland). We also thank one referee for insightful comments. A.O. holds a Ph.D. fellowship from Ministerio de Educación y Ciencia (MEC), Spain. This work was funded by grants AFG2004-0103/GAN and BFU2004-02253 (MEC, Spain).


  • Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. EF061460EF061505.

  • Communicating editor: J. B. Walsh

  • Received July 7, 2006.
  • Accepted October 9, 2006.


View Abstract