## Abstract

Analyses of effects of mutants on many traits have enabled estimates to be obtained of the magnitude of pleiotropy, and in reviews of such data others have concluded that the degree of pleiotropy is highly restricted, with implications on the evolvability of complex organisms. We show that these conclusions are highly dependent on statistical assumptions, for example significance levels. We analyze models with pleiotropic effects on all traits at all loci but by variable amounts, considering distributions of numbers of traits declared significant, overall pleiotropic effects, and extent of apparent modularity of effects. We demonstrate that these highly pleiotropic models can give results similar to those obtained in analyses of experimental data and that conclusions on limits to evolvability through pleiotropy are not robust.

BIOLOGICAL organisms are complex structures, so it is not surprising that their genes often show pleiotropic effects over two or more traits. Evidence comes both from observations on substitutions at individual loci and from genetic correlations among the traits at a population level. One view is that all genes are fully pleiotropic, at least at the quantitative level, in view of the highly interdependent and interactive nature of biological systems. Another view is that such an argument is irrelevant to genetic understanding, analysis, and application in that most loci are likely to affect no more than a small minority of traits in any biologically significant way (Stearns 2010).

The general degree of pleiotropy is important in several areas, for example in understanding pathways of gene action, in assessing potential side effects of genetic manipulation of particular pathways, in assessing the impacts of selective genetic improvement programs, and in considering the opportunities for evolutionary change through new mutations that may induce both favorable and unfavorable effects on fitness.

In a recent perspectives paper Wagner and Zhang (2011) discuss the magnitude of pleiotropy for complex or quantitative traits. They argue that it is highly restricted, *i.e.*, rather few traits are influenced by each gene, and consequently conclude there is more opportunity for evolutionary change according to Fisher’s (1930) geometric model than suggested by the analyses of Orr (2000) and for finding drugs specific for a particular genetic disease.

Different kinds of data sets were used by Wagner and Zhang, many of them collected by them and their colleagues. One is based on mapping QTL using F_{2} crosses of selected lines of mice (Wagner *et al.* 2008), but pleiotropy and linkage of nonpleiotropic QTL cannot readily be disentangled. The other is based on single mutant lines where this complication does not arise. Wang *et al.* (2010) analyzed four published data sets on mutants, two of which were very large: One included 253 morphological traits, each recorded on 2449 haploid lines of *Saccharomyces cerevisiae*, each mutant for a different gene. The mean number of traits affected by each line was 21.6 and the median, 7, was smaller because the distribution of the number per line was of exponential form. In the other data set, for 4905 genes and 308 traits in mice, the mean was 8.2 and the median 6. Overall a significant degree of pleiotropy was found for no more than 10% of traits observed.

Wagner and Zhang’s second quantitative conclusion was that genes affecting more traits have larger per-trait effects. They computed the Euclidean distance, a measure of the total of individual trait effects detected to be significant for that gene, and found that it increased more rapidly than would be expected if there were no relation between overall gene effects and number of traits affected (Wang *et al.* 2010, Box 4).

Their third conclusion was that there was strong modularity of effects, *i.e.*, that genes differed in the suite of traits they affected. The association between genes and traits was represented by a bipartite network and the presence of a modular structure detected by methods developed in physics (Barber 2007). A series of random networks was generated having the same marginal properties, *i.e.*, numbers of traits detected for each gene and correspondingly genes affecting each trait, and the association of genes and traits was found not to be randomly distributed.

Other data that indicate substantial pleiotropy have, however, been published (Stearns 2010). For example, many studies have been undertaken in *Drosophila melanogaster* by Mackay and colleagues. Of the *P*-element mutations they screened, 22% affected abdominal bristle number, 23% sternopleural bristle number, 41% starvation stress resistance, 5.6% olfactory behavior, 22% wing shape, 37% locomotor startle response, and 35% aggressive behavior (Mackay 2010). Thus many genes affect each trait and each gene affects many traits.

We consider there are a number of important assumptions in the analyses by Wagner, Zhang, and colleagues that lead us to question their general conclusions on the degree and modularity of pleiotropy. Most importantly, a gene or line was declared to influence a trait if its observed effect exceeded a threshold determined by statistical significance. The power of detection of an effect then depends on its real size, the amount of replication, and the statistical significance level, typically set at a high threshold to allow for multiple comparisons. Therefore many real effects were likely to be missed. They recognize what they call “the vexing problem of detection limits,” but argue first that effects missed are likely to be smaller than those detected, which, while true overall, is not necessarily so for any particular mutant and trait.

To analyze pleiotropy, we take a radically different starting point. We assume and simulate an alternative model in which *all* genes affect *all* traits, but not necessarily by the same real amount. We find this model can give the appearance of restricted pleiotropy and modularity if data are displayed and analyzed using published methods (Wang *et al.* 2010).

## Model and Methods

### Model

The observed effects (*z _{i}*) of a randomly sampled mutant on the

*i*th of the

*n*traits recorded are assumed to be the sum

*z*=

_{i}*a*+

_{i}*e*of a gene effect

_{i}*a*and a residual effect

_{i}*e*due to sampling error of the estimate. The latter may be based on many observations and its magnitude depends on the amount of replication and experimental conditions. As significance tests would be undertaken relative to the error SD, the power to detect gene effects is a function of

_{i}*a*/SD(

_{i}*e*), and so we standardize gene effects by the error SD and set SD(

_{i}*e*) = 1. Gene effects on each trait are assumed to vary among loci, with standard deviation σ

_{i}_{a}= SD(

*a*)/SD(

_{i}*e*) being the same for all traits.

_{i}As genes influence different pathways or systems, they may have differing effects on associated traits, which we describe by a correlation *r*_{a} (≥0) across traits. (We undertake two-tail significance tests, so positive correlations are not a limitation and simplify simulation.) Genes in the same pathway may have differential but correlated effects on a limited subset of traits, “modules”, within which we assume there is a correlation *r*_{ma} (≥0) of genetic effects. Similarly errors may be correlated across all traits; for example, if a common control is used, and these are defined by the correlations *r*_{e} and similarly *r*_{me} within modules assuming the error deviations follow the same pathways as the gene effects.

### Simulation

Normally distributed gene effects *a _{i}* on trait

*i*were sampled as the sum of effects common to all traits,

*N*(0,

*r*

_{a}σ

_{a}

^{2}), and as modular effects, independently for each module

*N*(0,

*r*

_{ma}σ

_{a}

^{2}), and independent within module effects

*N*(0, (1 −

*r*

_{a}−

*r*

_{ma})σ

_{a}

^{2}). Individual trait effects

*a*are therefore distributed as

_{i}*N*(0, σ

_{a}

^{2}) with a block covariance structure, such that covariances of effects are (

*r*

_{a}+

*r*

_{ma})σ

_{a}

^{2}in the same module and

*r*

_{a}σ

_{a}

^{2}among different modules, with

*r*

_{a}+

*r*

_{ma}≤ 1. Components of the error effects,

*e*, distributed

_{i}*N*(0,1), were sampled similarly with variances

*r*

_{a},

*r*

_{ma}, and (1 –

*r*

_{e}−

*r*

_{me}), assuming the same modular structure as for the gene effects.

The “phenotypic” correlation of sampled effects within modules is given by *r*_{p} = [(*r*_{a} *+ r*_{am})σ_{a}^{2} + *r*_{e}]/(σ_{a}^{2} + 1) and between modules by *r*_{mp} = (*r*_{a}σ_{a}^{2} + *r*_{e})/(σ_{a}^{2} + 1). Although these correlations and σ_{a} are sufficient to define the model, for clarity results are labeled in terms of the genetic and error correlations.

As mutant gene effects typically have a more leptokurtic distribution than the normal (Keightley and Halligan 2009), Wishart (gamma)-like gene, but not error, effects were also simulated. This was done by squaring the components of the correlated normal variates generated as above, attaching a random sign to each, and scaling such that individual trait effects had a reflected Wishart distribution with mean 0 and variance σ_{a}^{2}. The actual correlations of these variables are similar to, but not the same as, those for the normal: for correlations of normal deviates of 0.00, 0.25, 0.50, 0.75, and 0.90, those constructed for the reflected Wishart are 0.00, 0.21, 0.44, 0.69, and 0.88, respectively.

Information is lost by comparing trait values to a threshold and constructing a binary 0/1 variable denoting significance, particularly when only a small proportion of traits are declared significant. As Wang *et al.* (2010) and Wagner and Zhang (2011) used only these binary results, we adopt their analyses. Sampled phenotypic effects *z _{i}* were “tested” by comparison with thresholds expressed relative to the error SD, set at ±1.96, ±2.575, and ±3.09, corresponding to 5%, 1%, and 0.2% significant under a null hypothesis of no gene effect. As results for the middle threshold fell within the range of the others, most are not presented.

Replicate simulations with the same model were regarded as samples of different genes, each with a different array of effects, some of which were significantly different from zero. This invokes simplifications in that genes are regarded as independent and all are from the same modular distribution. Typically 10,000 replicates were used for each model, except to obtain randomized networks where much more computation was needed and we were interested only in the behavior of limited numbers of loci as in biological data.

### Analysis of gene trait networks—“modularity”

The strength of a modular structure can be revealed by an increase in, for example, variation in number of traits detected per module, but requires the modular structure to be known *a priori* so that traits could be classified accordingly. Detection in the absence of such information can utilize a matrix in which, say, genes are rows and traits are columns and elements are equal to estimated gene effects, but we follow previous work and use binary 0/1 variables denoting significance. The gene–trait bipartite network can be represented by a modularity matrix with elements (*b _{ij}* −

*p*), where

_{ij}*b*= 1 if trait

_{ij}*j*is significantly affected by gene

*i*and 0 otherwise. The

*p*=

_{ij}*k*/

_{i}d_{i}*m*are probabilities in the null model network that an “edge” exists between gene

*i*and trait

*j*, where

*k*is the number of traits significantly affected by gene

_{i}*i*,

*d*is the number of genes that significantly affect trait

_{i}*j*, and

*m*is the total number of significant gene–trait pairs (edges).

Modularity (*M*) is defined as the extent, relative to the null model network, to which edges are formed within modules instead of between modules. The modularity of each gene–trait bipartite network was calculated from the modularity matrix with the BRIM algorithm (Barber 2007). This is an iterative algorithm for maximizing *M*, with the sets of gene and trait vertices each recursively drawing the other into a modular structure. Randomly wired networks that share the same marginal properties were generated with the numerical algorithm of Maslov and Sneppen (2002). First a pair of edges (A, B) and (C, D) were selected (*i.e.*, trait A is significantly affected by locus B and C by D). The two edges were then rewired in such a way that locus B affects trait C and locus D trait A. If one or both of these new edges already existed in the gene–trait network, this step was aborted and a new pair of edges selected. Repeating this process led to a randomized version of the original network, for each of which the modularity was computed. For each network based directly on the simulated data, the average (*E*_{m}) and the standard deviation (*SD*_{m}) were obtained from 100 samples of randomized gene–trait networks. The scaled modularity was then obtained as (*M* – *E*_{m})/*SD*_{m}.

## Results

### Distribution of number of pleiotropic traits

The mean numbers detected of 100 traits analyzed when both the gene effects and the error effects are uncorrelated are shown in Table 1 for three models of trait effect, namely constant (σ_{a} = 0; *i.e.*, *a _{i} = a*, all

*i*, included for reference), normally distributed, and reflected Wishart-distributed gene effects. There is a high probability that real effects are missed even if

*a*or σ

_{i}_{a}= 2,

*i.e.*, 2 SD(

*e*). With a leptokurtic distribution of effects, relatively more are missed than for the normal, particularly if the variance of effects is large and the threshold is low because many more genes have small effect. Absence of pleiotropy cannot be simply inferred solely from the results of tests of significance.

_{i}If genes have correlated trait effects, whether overall or within modules, the probability an individual trait is detected and the mean number of traits detected is unchanged. The distribution of the numbers detected depends on the correlations, however (Figure 1). As the overall correlation, *r*_{a}, of effects increases, so the distribution of numbers detected widens, as it becomes increasingly likely that very few or very many are detected. A correlation within modules, *r*_{ma}, has a qualitatively similar but quantitatively much smaller influence. Correlations among the errors have less influence than among gene effects in these examples because σ_{a} > 1.

The shape of the distribution and consequently its mode are most affected when there is an overall correlation of gene effects (Figure 1). With a modular structure, the degree of skew of the distribution lessens as the number of modules increases (Table A1). Skew as illustrated by the mode becomes most extreme at high thresholds and with the leptokurtic distribution.

### Pleiotropic effects of detected loci

Wang *et al.* (2010) and Wagner and Zhang (2011) note that the Euclidean distance, *T*_{E}* _{z}* = √(Σ

_{i}_{=1,}

_{m}z_{i}^{2}),

*i.e.*, the root sum of squares of the

*m*detected (

*i.e.*, significant) effects, would be expected to be proportional to √

*m*if there is no relation between the magnitude of detected effects and the degree of pleiotropy (

*i.e.*, the average effect of a locus on each trait is constant). Under this assumption the coefficient of

*m*in the power curve

*T*

_{E}

*=*

_{z}*cm*would then be

^{b}*b*= 0.5.

This curve was fitted to our simulated data, using both untransformed data and a weighted linear regression of log *T*_{E}* _{z}* on

*m*. Both yielded very good fits as measured by

*R*

^{2}, typically >0.95, but the actual curve can depart consistently from the simple power curve (Figure 2). At high numbers detected, the best-fitting power curve underpredicts, but these infrequent points get little weight. Similar departures are seen in some analyses of the real data (Wang

*et al.*2010, Figure 3; and Wagner and Zhang 2011, Box 4).

The simulation results (Table 2) show that an exponent (*b*) of *T*_{E}* _{z}* in excess of 0.5 can be obtained if there is a correlation of effects or of the error, whether they are overall or within modules. Indeed, this statistic shows only that pleiotropy is present but tells little about its structure because it is insensitive to the correlations unless they are very small.

#### Pleiotropic effects of detected loci at the genotypic level:

Effects are detected when the phenotype (*z*) exceeds the threshold, but the genotypic effect (*a*) for a significant trait is expected to be lower than *z* because the expected environmental deviation of those above the threshold is also positive (equivalently, the regression *b _{az}* < 1). The Euclidean distance

*T*

_{E}

*for the genotypic effects for each correlated trait declared significant was also computed. The regression of*

_{a}*T*

_{E}

*on*

_{a}*m*rises more rapidly than does

*T*

_{E}

*and can exceed 1.0, whether the correlations are across or within modules (data not shown), because the regression of*

_{z}*a*on

*z*increases as σ

_{a}

^{2}increases.

### Modularity

For the model used previously with 100 traits and 100 loci in five modules of equal size, estimates of modularity and scaled modularity are given in Table 3. These show that correlations among all traits do not generate much modularity; indeed, higher correlations can lead to lower modularity. If correlations within modules are sufficiently high, in these examples >0.5, substantial scaled modularity can be generated. The modularity increases with the threshold and kurtosis of distribution of gene effects. Both the modularity and scaled modularity decrease as the correlation across modules becomes higher, but scaled modularity increases while modularity remains roughly unchanged if the correlation is within modules. As the number of traits or the number of loci is increased (Table 4), modularity falls and scaled modularity increases, with the latter reaching high values.

## Discussion

We are not arguing here that there may not be restricted pleiotropy or modularity of gene action, but that the statistical evidence has to be considered critically. In particular we point out that a simple appearance of restricted numbers of traits shown to be significant can simply be a consequence of sampling error and high threshold on variable effects among traits, and the threshold *P*-values used here are much lower than if corrections were made for multiple testing. Thus Wagner and Zhang (2011) give a theoretical example for a study with 30 traits examined with a power of 80%. If significant effects were found for 3 of them, the probability that 27 effects of the same or larger size are missed is <(1 − 0.8)^{27} ∼ 10^{−19}. So they conclude that, if the power is 80% and few effects are detected, it is unlikely the true number is much higher; *i.e.*, the detection bias is likely to be small in well-designed studies. We think this conclusion is overstated and the potential bias is severe. In their example, if the effects for all traits are actually the same and those missed are due only to sampling error, the probability that no others are missed is 0.8^{27} ∼ 0.0024, and the expected number of traits missed is 27 × 0.2 ∼ 5, *i.e.*, more than the number detected.

Hence the magnitude of actual pleiotropy is inevitably underestimated unless some alternative steps are taken. The use of basic trait data would considerably improve any analysis and we recommend this be done rather than merely judging significance, especially when thresholds are high so the binary information is a small fraction of that in the actual records. Information on sign can also be lost. We consider it would be better to assume a joint distribution of effects of loci on the traits and attempt to estimate parameters of it. This would not be easy, but there is opportunity to set prior distributions and test fit in some kind of iterative structure. Any further analysis is well beyond the scope of this article, however.

We also consider the evidence that the power curve of effects as measured by Euclidean distance that shows the magnitude of effects increases with the degree of pleiotropy is not well founded. A gene that has a large effect correlated across traits is likely to be detected for more traits, so there is a nonrandom association between rate of detection and mean effect. It does *not* imply that ones of larger effect show more real pleiotropy as the degree of pleiotropy is the same for all pairs of loci in the model used here. The relationship between mean and mode of numbers detected is indicative of widespread pleiotropy if the mode is much smaller than the mean. This relationship is more extreme in the *absence* of modularity. Thus the more extreme distributions noted by Wagner and Zhang (2011) (*e.g.*, their Figure 3B) are indicative more of correlations of gene effects across all traits than of correlations within modules.

The strongest evidence for modularity of gene action is undoubtedly provided by the specific analysis of networks. Correlation of effects as in a modular structure does indeed generate such evidence, albeit simulated assuming a model of universal pleiotropy with variable correlated effects. We do not doubt that the results cited by Wagner and Zhang (2011) show modularity of gene action, but we can simulate such results merely by assuming there is a moderate correlation of effects among loci, even though there is universal pleiotropy. To assess the sampling properties of the modularity and scaled modularity, sets of data were sampled using the same parameter values (number of genes, traits, and correlations). The results showed that the variation in the modularity is rather low relative to its mean, whereas that in the scaled modularity is relatively much larger. This indicates that, though scaled modularity has high sampling error when the system is small, the modularity is more robust. Therefore our results indicate that large values of scaled modularity (*e.g.*, >10) as obtained by Wang *et al.* can arise with large data sets with a fully pleiotropic model, even if the correlation within modules is ≤0.5. The simulations show that the high values of modularity they obtained (>0.2 say) are more likely in our model if there are large correlations within modules, high threshold values, high kurtosis of distribution of gene effects, and low variance of gene effects (*i.e.*, low σ_{a}).

Taken together with the lack of evidence for a relationship between the degree of pleiotropy and significance of effects, it implies that we have to be wary of drawing conclusions about lack of evolutionary constraints. A deeper analysis is required.

## Appendix

## Footnotes

*Communicating editor: S. F. Chenoweth*

- Received October 10, 2011.
- Accepted December 24, 2011.

- Copyright © 2012 by the Genetics Society of America