Phenotypic stereotypes are traits, often polygenic, that have been stringently selected to conform to specific criteria. In dogs, Canis familiaris, stereotypes result from breed standards set for conformation, performance (behaviors), etc. As a consequence, phenotypic values measured on a few individuals are representative of the breed stereotype. We used DNA samples isolated from 148 dog breeds to associate SNP markers with breed stereotypes. Using size as a trait to test the method, we identified six significant quantitative trait loci (QTL) on five chromosomes that include candidate genes appropriate to regulation of size (e.g., IGF1, IGF2BP2 SMAD2, etc.). Analysis of other morphological stereotypes, also under extreme selection, identified many additional significant loci. Less well-documented data for behavioral stereotypes tentatively identified loci for herding, pointing, boldness, and trainability. Four significant loci were identified for longevity, a breed characteristic not under direct selection, but inversely correlated with breed size. The strengths and limitations of the approach are discussed as well as its potential to identify loci regulating the within-breed incidence of specific polygenic diseases.
THE dog, “man's best friend,” shares a large number of the complex phenotypes observed in human populations, including variation in morphology and behavior, as well as many types of polygenic disease. In the past decade, Canis familiaris has emerged as an excellent system for genetic analysis of complex phenotypes. Most of the advantages offered by the canine system over other mammalian systems derive from its population structure (Ostrander and Kruglyak 2000; Sutter et al. 2004; Parker and Ostrander 2005; Goldstein et al. 2006; Karlsson et al. 2007; Parker et al. 2007). There are >350 distinct breeds recognized in the world today, many of which are isolates that have been, for the most part, selected for morphology and behavior. Over hundreds of years humans and dogs have formed a multitude of mutalistic relationships harnessing the phenotypic flexibility of the dog genome. New dog breeds were often developed by crossing individuals of unique breeds bearing desired features, followed by strong selection for the desired phenotypes (hunting ability, coat color, skull shape, body size, etc.), thus increasing the frequency of selected genotypes in the modern-day population. To be a registered member of a breed, both parents of an individual must have been registered members of the same breed. As a result, genetic heterogeneity is reduced within breeds, but is high across breeds (Parker et al. 2004; Lindblad-Toh et al. 2005). Consequently, many phenotypes are either fixed or close to fixation in a large number of populations.
Genetic isolates have provided the key analyses of complex polygenic disease (Lindblad-Toh et al. 2005; Goldstein et al. 2006; Karlsson et al. 2007; Parker et al. 2007) as well other phenotypes. However, the use of large numbers of such isolates has not, to date, been applied to allele trait association. The dog presents a unique opportunity to examine the power of this approach. Dog breeds, in which regions of the genome are “fixed,” can be treated in a manner similar to recombinant inbred populations: “Fixed” portions of a breed's genome will remain invariant as long as the breeding population remains closed. These “fixed” aspects will continue to produce consistent phenotypes, and therefore the phenotype and genotype need not be measured on the same animal. Thus, both the allele frequency of a single nucleotide polymorphism (SNP) in fixed regions of the genome and the phenotype are characteristics of a breed. As a result, associating breed-specific genotypes with “fixed” phenotypes in multiple breeds (across-breed mapping) presents a powerful tool for identifying quantitative trait loci (QTL) that may form the genetic basis for the phenotypic diversity observed in dog breeds.
Similar approaches have been described using inbred mouse strains (Grupe et al. 2001; Liao et al. 2004; Pletcher et al. 2004; Wang et al. 2005), and these have been combined with classical QTL analysis (Park et al. 2003; Dipetrillo et al. 2004; Wang et al. 2004; Cervino et al. 2005). However, the number of inbred mouse lines available are far fewer than the number of dog breeds, and the number of phenotypes offered by mice much fewer than what is offered by the nearly 300 breeds of domestic dog. Moreover, the genome structure of any inbred mouse line is far more restrictive than the genomes that characterize a dog breed. Genomes of dog breeds have far more heterozygosity and have survived for centuries in quite variable environments. In short, the selective environments experienced by any dog breed have been far less restricted than those used during the inbreeding procedures that give rise to an inbred mouse.
Ideally, two types of data are required for across-breed association analysis: (1) a common set of well-distributed, highly informative SNPs that characterize the entire genome for each of many breeds and (2) a careful quantitative evaluation of the fixed phenotypes associated with each breed. The phenotypes most amenable to this mapping strategy are those that have been under stringent selection, such as morphology and behavior. Here we analyze the genetic basis for size using across-breed mapping and then present examples of the technique applied to other classes of traits: additional morphological features, behavior, and the relationship between size and longevity.
MATERIALS AND METHODS
A total of 148 domestic dog breeds were characterized for a variety of sex-averaged phenotypes: height, weight, other morphology characters, longevity, and behavior. Phenotypic values used for the different breeds are summarized in supplemental Table 1. Height at the withers and weight were obtained from the published American Kennel Club (AKC) breed standards (American Kennel Club 1998). The residuals from the regression of WT0.33 onto height were derived and used as a measure of shape (e.g., breeds that are heavier or lighter than other breeds of the same height; see supplemental Figure 1). “Short coat” (Wilcox and Walkowicz 1995) was coded as a qualitative variable: 1 for all breeds with a very short coat as the standard and 0 for all others. “Ear bend” (Wilcox and Walkowicz 1995) was scored as the degree of bend in the ear on a scale from 1 (hanging low) to 4 (completely erect; cropped ears were not scored). “Tail curve” (Wilcox and Walkowicz 1995) was scored as the degree of curve in the tail on a scale of 1 (straight) to 5 (tightly curled). Additional phenotypes were measured from breed pictures (Palmer 1994; Wilcox and Walkowicz 1995; http://images.google.com/) using the metrics described in Figure 1. Because the pictures utilized were not standardized, only ratios of these metrics could be used. The following ratios were defined using the metrics in Figure 1: (1) snout:head [a/(a + b)]; (2) snout height:head [c/(a + b)]; (3) head:body [(a + b)/e]; (4) leg:body [(h + i)/e]; (5) tail:body [f/e]; (6) neck:body [j/e]; and (7) chest:body (g/e).
Longevity data (supplemental Table 1) were compiled from a variety of sources (Michell 1999; http://users.pullman.com/lostriver/longhome.htm; KC/BSAVA 2004; Egenvall et al. 2005). These represent data primarily from owner surveys. One of us, Pluis Davern, an experienced dog trainer and judge (http://www.sundownerskennels.com/training.html; http://www.infodog.com/judges/17422/juddat.htm; http://www.akc.org/breeders/resp_breeding/Articles/truetoform.cfm), scored behavioral phenotypes as qualitative variables (0, 1, or NA). Four distinguishing patterns of dog behavior were scored: pointing, herding, boldness, and trainability. Additional behavioral data were taken from Hart and Miller (1985). Behavioral scores for the 148 breeds are in supplemental Table 1.
DNA collection and isolation:
DNA samples were collected from dogs participating in AKC or otherwise sanctioned events, including dog shows, performance events, and obedience and behavior trials. Samples were collected as either whole blood or by cheek swab by registered veterinarians or licensed veterinary technicians after obtaining the owner's written consent. AKC or other registration numbers were collected on each dog, as was owner contact information, pedigree data, health history, and when possible, permission to recontact owners regarding future queries was also obtained. Wherever possible, care was taken to obtain samples from dogs unrelated at the grandparent level.
Blood samples were collected as whole blood in acid citrate dextrose or EDTA anticoagulation tubes. Buccal swabs were collected using standard protocols with Cytosoft cytology brushes (Medical Packaging, Camarillo, CA). DNA was extracted from the brushes using a QIAamp blood mini kit (QIAGEN, Valencia, CA) following the manufacturer's protocol. DNA was extracted from the blood samples using a standard phenol/chloroform extraction method (Maniatis et al. 1982). Coded samples were aliquoted and stored for long-term use at −70°. Information was entered into a My SQL custom database.
All procedures were performed in accordance with approvals from the Animal Care and Use Committees from the University of Utah, the National Human Genome Research Institute at the National Institutes of Health, and the Waltham Centre for Pet Nutrition, Mars.
Multiple breeds were characterized using a common set of SNP markers. Variation in the informativeness of marker alleles is presented in supplemental Figure 2. SNPs were selected for use that met the following criteria: (i) SNPs with a q score >45 that have flanking sequence occurring only once in the genome sequence, (ii) SNPs that passed Illumina in-house suitability testing, (iii) SNPs where the minor allele was observed in ≥2 of 11 breeds tested; (iv) SNPs for which the minor allele was observed in ≥1 of 11 breeds as necessary, included to achieve complete coverage. The 25,073 SNPs resulting were filtered such that SNPs meeting all four criteria were added to the final data set sequentially if they were at least 380 Mb from all SNPs already in the data set. SNPs meeting criteria i, ii, and iv were then added, maintaining the minimal spacing. The resultant 4608 SNPs were submitted to Illumina to generate three oligo pools. DNA samples were submitted to Illumina for fast-track Golden Gate analysis (Fan et al. 2006).
For the experiments described, 2801 dogs representing 147 breeds were used. One hundred twenty-nine of these breeds were represented by ≥10 dogs (supplemental Figure 3, supplemental Table 3). DNA from each dog was genotyped using 1536 markers, of which 674 were spaced across the 38 canine autosomes. A total of 862 additional markers were concentrated in regions of interest that showed maximal variation in allele frequency between breeds. The focused selections were chosen to further characterize areas that allowed breeds to be easily distinguished and may be linked to traits of interest (e.g., Sutter et al. 2007). As a result, the median distance between markers was 409 kb although only ∼26% of the genome was within 250 kb of a marker (supplemental Table 2).
Details of SNP probe sequences associated with QTL and of the sequences in which these markers are imbedded are presented in supplemental Table 4 (see supplemental table legend). Relevant marker allele frequencies in different breeds are presented in supplemental Table 5.
We tested for correlations between breed allele frequency (xi) and breed-characterized phenotypes (yi) using a weighted Pearson product correlation:where , where ni is the number of animals for breed i.
Two measures of significance were important: single SNP P-value and genomewide P-value (e.g., the probability of a particular rxy value in a single test and the multi-test correction when testing all SNPs across the genome, respectively).
We used permutation tests to establish the null distribution of the rxy statistic for each SNP and for each phenotype. A generalized extreme value distribution was fit to the empirical “null” data using the gevFit function of the fExtremes package (Wuertz 2006) for R (R Development Team 2006). The Kolmogorof–Smirnoff test (Conover 1971) of the R package (ks.test) was used to test the goodness of fit. Distributions with a ks.test P-value of ≤0.01 were considered poorly estimated and dropped from further analysis. The significance of rxy values were estimated using the cumulative probability function (pgev) and −log10 transformed for convenience (logP). For each permutation, the maximum score across all SNPs was recorded as the single genome-scan maximum. Genome-scan maximum values from 1000 permutations were used to estimate the null distribution of a genomewide scan. The 90, 95, and 99% percentiles of this distribution were used as the thresholds from genomewide significance of 0.1, 0.05, and 0.01, respectively.
Power to detect association:
We estimated the power to detect association with a neighboring marker allele as a function of the number of breeds available. In Figure 2, it can be seen that the power to identify an association drops off rapidly as the number of breeds decreases. This loss of power becomes particularly relevant when phenotypes have been evaluated in only a small number of breeds.
Markers were considered informative if they had a wide range of allele frequencies across breeds. Conversely, a SNP for which both alleles displayed equal frequency across all breeds was uninformative. We estimated the power to detect an association as a function of allele-frequency variation among breeds. The significance (logP) of a single-marker test for differently modeled situations is graphed in Figure 3 (y-axis) as a function of the distance between the SNP markers (x-axis). Three patterns of variation in the SNP allele frequency among breeds were considered (Figure 3, insets): histograms representing the number of breeds (y-axis) in each allele-frequency bin (x-axis). The ability to detect QTL increases with increasing variation of its occurrence in different breeds.
The “lm” function of R was used to perform a weighted multiple regression, with the square root of breed count used for weights (Chambers 1992). The “glm” function of R was used with the option family = “binomial” to carry out a logistic regression (Hastie and Pregibon 1992). The “regress” function was used to carry out a mixed-model analysis (Clifford and McCullagh 2006) with allele counts as the fixed effects and the breed similarity matrix as the random effects. The variance matrix between breeds was calculated as the similarity between all pairs of breeds using markers separated by at least 500,000 bp. We defined the similarity between two breeds as one minus the average absolute difference in allele frequency across all markers (see supplemental Table 6 for all similarity values). Thus, breeds that are identical had a similarity score of 1 and breeds that were completely different had similarity scores of 0. A leave-one-out strategy was used to predict breed phenotypes with the mixed model. Coefficients estimated from the data with a breed left out were used to predict the phenotype of that breed (see supplemental Figure 4).
A number of genes regulating size or shape have been identified in different mammals (humans, mice, rats, or dogs). Several of these regulate relatively large amounts of phenotypic variation (e.g., IGF-1, IGF-2). Identifying QTL containing such candidate genes provided evidence suggesting that the method proposed was robust.
Selected regions of the genome were examined using a SNP scan of 148 breeds. Using association analysis, several QTL were identified for size (WT) and shape (HT and residuals of WT0.33 regressed on to height). Table 1 presents the location and characterization of the loci for which the most evidence was accrued. Loci regulating both height-at-the-withers and body weight are located on Canis familiaris autosome (CFA) 7, 10, 15, and 34, whereas the locus on CFA 9 regulates only body weight. When Wt0.33 is regressed onto height at the withers, a variation in shape can be distinguished that represents differences between breeds that range from dogs that are thin for their height (pursuit hounds such as the greyhound, Afghan hound, or whippet, as well as some smaller dogs such as the fox terrier) to ones that have a large body mass for their height (see supplemental Figure 3). The locus on CFA 6, associated with this phenotype, was not associated with either height or weight. In the Portuguese water dog (Lark et al. 2006), a highly significant locus on CFA 12 that regulates an inverse correlation between limb bone length and width was identified. This locus was not identified with genomewide significance in the present across-breeds WT0.33 residual scan, but it was found in that scan at a significance that validated the pre-identified locus from the Portuguese water dog. Such instances of lowered significance may reflect a low frequency of breeds in which a locus has been fixed.
As can be seen in Table 1, many of the loci contain candidate genes that are associated with size, including SMAD-2 and NPR2 on CFA7; HMGA2 on CFA10; IGF1 on CFA15, as well as a murine high-growth-regulating region containing SOCS2; and IGF2BP2 on CFA34. Thus, associating SNPs from multiple breeds with breed-specific metrics may facilitate association mapping of complex, polygenic phenotypes (across-breed mapping).
Mapping breed characters:
In many breeds, a number of other desired morphological traits have been under stringent selection and thus should be fixed. Descriptions of these phenotypes are presented in the materials and methods. Their distribution among breeds is presented in supplemental Table 1. We have used across-breed association mapping to identify putative QTL for many of these (Table 2). In all, 10 traits were associated with 26 loci distributed over 14 chromosomes at a significance better than P < 0.01. As expected, many of these QTL (10) were identified at high significance, exceeding a genomewide threshold of P ≤ 0.001. QTL for two aspects of snout size or shape were associated with the same SNP on CFA 12; both the length of tail and the degree to which ears are erect were associated with a locus on CFA 15 that also is associated with overall size (see Table 1); similarly, size of snout and erectness of ears were associated with another size locus on CFA 34; and two closely linked loci on CFA 9 regulate variation in the size of the neck or head. Again, suggestive candidate genes were found associated with some of these QTL: TNFRSF19 and Fgf5 with short coat and COL6A3 with the degree of tail curvature. As expected, this mapping technique appears to be very powerful for phenotypes that are very close to fixation and also are found in a large number of breeds, the optimal proportion approaching 50% of the breeds analyzed.
Additional tests for significance and effects of breed structure:
QTL identified by single-marker tests may implicate causative regions of the genome, or they may represent false positives: shadow effects resulting from autocorrelations in the data. False-positive results may be caused by unequal sharing of genome regions between the breeds (breed structure), coselection of multiple unlinked regions, and/or codependence of unlinked genome regions (interactions). Multiple-regression analysis provides an estimate of the independence of the loci regulating a trait. QTL that deviate from the additive-independent model will not remain significant in a multiple regression and may represent false positives or more complex effects. QTL may appear less significant (or not significant) in a multiple regression if they were coselected with other loci, or if they are involved in interactions with other loci. Table 3 presents the results of multiple-regression analyses of those traits in Tables 1 and 2 that are associated with multiple loci. Several loci either were not significant or demonstrated marginal significance. In all but one instance, the sum of the significant single regression R2 values greatly exceeded the multiple R2 value, suggesting that some loci were not causative or that interactions and/or coselection were occurring. In the case of weight, there was an apparent interactive effect, P = 0.0009, between the major locus on CFA 15 (associated with SNP BICFPJ263341 at 44 Mbp) and the locus on CFA 10 (associated with SNP gnl.ti.360206886_2 at 11.5 Mbp). This interaction remains significant in the multiple-regression (0.026) and in the mixed multiple-regression model (0.003; see below). Coselection can mimic a significant interaction effect in this situation (see discussion). For one trait, the ratio of head to body metrics (“head ratio”), the sum of the three significant individual R2 values was only slightly greater than the multiple R2 value, suggesting that these loci might be acting independently.
Considerable population structure exists among dog breeds (Parker et al. 2004). Using the popgen (Marchini 2004) package of R, we estimated measures of diversity among these breeds (Nicholson et al. 2002). The mean “c” (analogous to FST) value is 0.25 with individual breed values ranging from 0.05 to 0.61. In an across-breed association analysis, noncausative (shadow) loci may result from effects of population substructure due to genetic relatedness among breeds. To test for this, we used a mixed-model analysis (see materials and methods) to predict trait values of weight as well as head:body ratio (head ratio). We found that all of the significant QTL for weight or head ratio (Table 3) remained significant in a mixed model correcting for genetic relatedness of breeds, with P-values ranging from 10−2 to <10−5 for weight and <10−3 for the three significant head/rat loci.
Examples illustrating the future potential of the mapping technique:
Longevity and size:
In general, dogs representing breeds of small size (e.g., Pekingese, toy poodle, terrier breeds) live appreciably longer than those from larger-sized breeds (e.g., Great dane, St. Bernard, Irish wolfhound) (Egenvall et al. 2005). We have mapped loci for longevity using multiple breeds spanning a comprehensive range of sizes. An analysis of breed longevity had been compiled by K. Cassidy (http://users.pullman.com/lostriver/longhome.htm), but many of the breeds for which we had genotypes were not included in that database. We therefore prepared a similar database for all breeds genotyped in our study using a variety of website resources (supplemental Table 1). Figure 4 compares longevity/size data between the two databases. The negative correlation between age of death (AOD) and size is obvious. The slope of the regression of size onto longevity is the same in both data sets, although the difference in intercepts indicates that the database that we developed yields an average age of death that is older. This may be due to the fact that Cassidy's data utilized information from both veterinarian records and owner responses to questionnaires, whereas our data were biased toward owner surveys, which typically prefer to reference longer-lived animals. Although this may produce an inflated mean value of AOD, it presents a more sensitive signal for genetic analysis. We therefore utilized our larger database, together with the genotyping used in Table 1, to identify QTL for breed-associated age of death (Table 4).
Included in Table 4 are data indicating the presence or absence of size loci associated with the same SNP. Seven loci were identified, three of which, CFA 7, 10, and 15 were associated with significant size (as in weight) loci. These were also the most significant loci for longevity. A fourth, on CFA 34, was associated with a less-significant weight locus. Loci on CFA 9, 23, and 25, although quite significant for age of death, were not significant for size with the exception of the locus on CFA 9, which is linked to a very significant size locus (see Table 1). When these age-of-death loci were combined in a multiple regression, three on CFA 10, 25, and 34 were no longer significant and the multiple R2 was approximately half the value of the sum of the single R2 values.
Two aspects of dog behavior that appear to be highly breed specific are herding and pointing. Pluis Davern, a nationally recognized dog trainer qualified to judge a large number of breeds (http://www.infodog.com/judges/17422/juddat.htm), scored the 148 genotyped breeds for two additional phenotypes: “boldness vs. timidity” and “trainability.” Behavioral scores for the 148 breeds are presented in supplemental Table 3. Using these scores, we identified several loci of interest (Table 5). We identified one locus for pointing on CFA 8 with a genomewide significance threshold of 0.01 < P < 0.05. Three loci were detected for herding; these were located on CFA 1 (P < 0.01) and on CFA 4 and CFA15 (0.01 < P < 0.05). While the boldness and trainability gestalts are subjective, and at best descriptive, we nevertheless found one significant (P < 0.01) locus for trainability on CFA 10 as well as five for boldness on CFA 15 and 22 (P < 0.01) and on CFA 1, 4, and 17 (0.01 < P < 0.05). In a multiple regression, all of the loci for boldness remained significant. The locus on CFA 15 is interesting in that it does not appear to be related to size, as approximately equal numbers of large and small breeds were found to be bold (see supplemental Table 3), and boldness and size were not correlated (r = 0.18; P = 0.3). Possible candidate genes are listed in Table 5 for herding and pointing, along with two of the boldness QTL. Included in Table 5 are data for excitability (comprising 56 breeds) taken from Hart and Miller (1985). Two significant QTL were identified on CFA 7 and 15. Both coincided with major-size loci. Unlike the relationship between boldness and size, excitability was highly correlated with size (r = −0.8; P < 10−12, despite the small data set (56 breeds vs. the 148 used in the analysis of boldness).
Three powerful genetic procedures are now available using a canine model:
Segregation in planned crosses or within a breed population can be used to identify loci for simple and complex phenotypes. This approach takes advantage of the large LD distances that can be attributed to founder effects and bottlenecks (for example, Mignot et al. 1991; Acland et al. 1998, 1999; Lingaas et al. 1998; Van de Sluis et al. 1999; Jonasdottir et al. 2000; Chase et al. 2005a, 2006; Todhunter et al. 2005).
LD mapping across breeds has been used to reduce haplotypes of simple and complex phenotypes to reasonably small DNA sequences and often to identify single genes (Clark et al. 2006; Goldstein et al. 2006; Karlsson et al. 2007; Parker et al. 2007; Sargan et al. 2007).
The across-breed mapping method described here, which combines association with multiple-breed LD mapping, thereby associating small regions of the genome with the phenotype, can be used.
The results presented here illustrate the power of across-breed mapping using a data set of >100 breeds. Using morphological phenotypes, we have found an interaction between loci regulating weight on CFA 10 and CFA 15 and implicating a major locus for size on CFA 7. We have validated loci, first described in the Portuguese water dog: one, a major locus regulating shape (limb length vs. width) on CFA 12 (Lark et al. 2006) and two size loci on CFA 15 (44 and 37 Mb) identified in previous studies (Chase et al. 2002, 2005b ; Sutter et al. 2007). In addition, we have found a number of loci affecting morphology, some of which may be independent regulators of the relation of the size of the skull to the post-cranial body. For these and other traits, the LD distance associated with any SNP is ∼500 kb. This is much smaller than the LD distance associated with markers when mapping within a single breed (Sutter et al. 2004). Nevertheless, as seen in Tables 1, 2, and 5, many genes remain to be explored in searching for alleles that regulate the phenotypes in question (in the data presented, this number ranges from 1 to 19, depending on the QTL).
Most often, across-breed mapping identifies markers that tend to be near or at fixation (homozygous) in breeds with the associated phenotype. Breeds in which the phenotype is still segregating will not contribute to the power of QTL identification. However, they will provide a resource in which the association can be validated using within-breed segregation analysis. Such breeds are readily identified from the across-breed SNP genotyping database. It should be possible now to validate the most significant (P ≤ 0.001) of the other loci in Table 2 using breeds in which the implicated SNPs are segregating [e.g., the locus on CFA 32 for short coat (Table 2) was identified by segregation analysis using dachshunds or corgis (Housley and Venta 2006)].
Limitations to across-breed mapping will always necessitate validation using within-breed segregation analysis. One limitation of the method is the potential for false positives that may arise from population structure, whereby causative regions of the genome displaying significant associative P-values cannot be distinguished from noncausative ones. Our simple association analysis has made the assumption that dog breeds are independent of each other. However, this is rarely the case. Breed structure is the network of haplotype regions shared between breeds. For example, we would expect a high proportion of sharing between the genomes of the standard and the toy poodle, although we expect significant differences in regions with loci related to control of size. The mean FST between the breeds used in our study is 0.25 (SD = 0.11), indicating that they have not diverged greatly. Moreover, principal component (PC) analysis of the allele frequencies (data not presented) shows that the allele shared between breeds is not coherent (e.g., the first PC explains only 4% of the total variation in allele frequency). Thus, different breeds share different parts of the genome.
Reviewing similar techniques applied to inbred mouse strains, Payseur and Place (2007) have summarized the power and pitfalls of the technique (e.g., they showed that unequal relatedness between strains can give rise to false-positive associations, since causative regions of the genome may be co-inherited with noncausative regions). Studies in the mouse suggest extensions to this technique in the dog as more robust SNP and phenotype data become available: (1) use of SNP haplotypes spanning a small physical distance (e.g., 300 kb) instead of single SNP alleles (Karlsson et al. 2007; Salmon Hillbertz et al. 2007; Sutter et al. 2007), (2) correction for relatedness between breeds using mixed-model analysis, (3) balanced representation of breeds, and (4) correction for nonsystenic LD by testing multiple loci in the same model.
We have used an across-breed averaged correction for breed structure to correct for effects of breed structure on weight and head-to-body ratio and a multi-QTL regression model to rule out nonsystenic LD among loci that we have detected. Nevertheless, interactions and coselection can result in false positives and, as with mouse inbred strains, it will always be necessary to validate loci.
The current data set has several limitations. In Figure 2, we presented evidence that significance is limited to 250 kb on each side of a SNP. By this criterion, our database analyzes only 26% of the genome with the remainder not considered in the association mapping that identified the loci used in the multiple-regression model in Table 3. Therefore, within-breed validation of segregating loci will be required to completely rule out nonsystemic LD. Beyond shadow effects, there remain other complex effects, such as interactions between loci and/or coselection of loci during breed formation that may confound results. The data in Table 3 indicate that such effects may be present for most of the traits examined.
In the future, genotyping platforms should offer deeper coverage of the genome (∼50,000 well-placed and informative SNPs), more robust and balanced breed representation, and more dogs per breed (30–50). Finally, improved phenotypic characterization of breed stereotypes is needed.
Phenotypes that have been under stringent selection are best suited to across-breed association mapping, and this is apparent in the data in Table 2 where highly significant values for several stringently selected morphological QTL were observed. Similarly, stringent selection for behavior may be responsible for the behavioral loci identified here. Candidate genes associated with these loci (Table 5) include ones that might be expected to play a major role in regulating behavior: MC2R on CFA 1 (27,381,939 bp) is a melanocortin receptor, and C18orf1 (27,572,327 bp) has been implicated in schizophrenia. DRD1, on CFA 4 (40,743,436 bp), encodes a dopamine subtype receptor. CNIH, on CFA 8 (33,396,000 bp), has been implicated in cranial nerve development. Finally, PCDH9, on CFA 22 (24 273 482 bp), encodes a protein localized to synaptic junctions and believed to be involved in specific neural connections and signal transduction. Although the behaviors involved are poorly defined, the presence of candidate genes appropriate to behavior is encouraging.
Despite the likely possibility of false positives, the across-breed mapping technique can focus attention on loci that may regulate genetic differences between breeds when these cannot be investigated using segregation within breeds. In an extensive study of within-breed longevity involving many different breeds, Galis et al. (2007) were unable to find evidence for an inverse correlation between longevity and size and neither have we seen such an inverse correlation in Portuguese water dogs that display a range of sizes approaching threefold (our unpublished data). The peculiar inverse correlation between longevity and size seen in Figure 4 is strictly a between-breed phenomenon and provides an excellent example of a trait that can be approached with across-breed mapping. The data in Table 4 suggest that a subset of loci, which control body size, also contribute to longevity, with some playing a greater role in the aging process than others.
Across-breed mapping depends on variants of the genomic architecture that are relatively fixed in a large number of different breeds. Given accurate estimations of breed-disease frequency, this technique can be used to determine the impact of the breed-fixed genome regions on the disease. All of these breeds represent “successful” genome architectures. While some may be more or less prone to a disease, they are still functional productive genomes. It is not likely that a large number of breeds harbor a single deleterious mutation that can be detected in this fashion. Thus, it is likely that one of several functional genome variants will predispose to a disease state as, for example, one might encounter with size loci where particular alleles may predispose toward orthopedic diseases.
Because power in across-breed mapping derives from variation between breeds in the frequency of disease (as in the simulation in Figure 3), this approach functions only if disease reporting is accurate. While databases of disease frequency exist, they are often based on breeder-directed health surveys and inherent biases exist. More useful may be the growing number of veterinary school databases spanning several years.
The quality of genotypic data is paramount as well. Ideally, large public databases providing comprehensive SNP data on ∼50 independent lineages for most AKC-recognized dog breeds should be made available as the genotypic breed standard for future mapping studies. Such an effort, termed CanMap (http://www.sciencemag.org/cgi/content/full/sci;317/5845/1668), is currently underway in an effort initially involving investigators from Cornell, the University of California at Los Angeles, and the National Human Genome Research Institute (Pennisi 2000). The initial end point will be a public repository of dense SNP profiles of about a dozen dogs from each of nearly a hundred breeds, plus a set of wild canids, which together will be an invaluable resource for the genetic dissection of complex polygenic diseases, a large number of which are common to both dogs and humans.
In summary, across-breed mapping is another facet of the canine model that complements within-breed mapping and LD mapping. It implicates new regions of interest and can provide validation of previously identified loci.
We thank the thousands of pet owners who provided samples and data about their dogs for their participation and support of this work and the many dog show organizers who kindly allowed us to have collection stands to gather these samples for dog research. We thank John Fondon, III, and Heidi Parker for helpful comments regarding this manuscript. We gratefully acknowledge funding from the Judith Chiara Family Trust and National Institutes of Health GM063056 (K.G.L. and K.C.), the Intramural Program of the National Human Genome Research Institute (E.A.O.), and Mars (A.M. and P.J.).
Communicating editor: M. Johnston
- Received February 5, 2008.
- Accepted March 12, 2008.
- Copyright © 2008 by the Genetics Society of America