(a) Direct comparison of P-values between ASREML and EMMA, computed from 553 SNPs of maize panel data and the flowering-time phenotype using a similarity-based kinship matrix. All P-values are almost identical, implying that two methods are almost identical in terms of accuracy. One SNP in ASREML failed to converge during the variance-component estimation while it succeeded in EMMA. (b) Cumulative distribution of P-values across different models. Under the assumption that the SNPs are unlinked and there few true SNP associations, the observed P-values are expected to be close to the cumulative P-values. A large deviation from the expectation implies that the statistical test may cause spurious associations. Simple, a simple t-test; SA, structured association; MM, an F-test with a mixed model with a specified kinship matrix.
Genomewide cumulative distribution of observed P-values between (a) 13,416 Arabidopsis SNPs and flowering-time phenotypes across 95 strains using various models and(b) 106,040 mouse HapMap SNPs and three phenotypes, body weight (374 measurements over 38 strains), liver weight (304 measurements over 34 strains), and saccharin preference (280 measurements across 24 strains). S or Simple, a simple t-test; SA, structured association; MM, an F-test with a mixed model with a haplotype similarity kinship matrix; SA+MM, the unified mixed model using the output of STRUCTURE as additional fixed effects.
Genomewide scans for association with initial body weight, liver weight, and saccharin preference, using simple t-tests and F-tests with mixed models, on the basis of a kinship inferred from haplotype similarities.
Comparisons of the statistical power of the EMMA method across three different inbred mouse phenotypes and flowering time of Arabidopsis and maize, by randomly selecting causal SNPs across the genomewide SNPs. (a) Pointwise power denotes the power to identify causal SNPs at a nominal P-value of 0.05. (b) Regionwide power assumes 50 hypothetical tagSNPs in a genomic region. With 20 kb between tagSNPs, the genomic region covers up to 1 Mb. (c) Genomewide power is the power to achieve genomewide significance using the P-value threshold 10−5, which is conservative compared to the permutation-based genomewide significance thresholds using the original phenotypes. The phenotypic variation explained by SNP effect is computed assuming a minor allele frequency (MAF) of 0.3.
Comparisons of the genomewide power of the EMMA method applied to inbred mouse associations for simulated phenotypes with various SNP effects, genetic background effects, and numbers of multiple measurements. The significance threshold is P = 10−5. t is the number of multiple measurements per strain, and is the fraction of the variance explained by genetic background among overall phenotypic variances when the SNP effect is not added. (a) With varying β and t. (b) The same as a, using the mean phenotype value per strain instead of individual measurements. (c) With 10 multiple measurements per strain, varying β and (d) With β = σ, varying t and The effect of population structure is varied by changing the ratio of two variance components, and the numbers of multiple measurements are simulated with (a) 10 measurements and (b) a single measurement per strain.
Goodness-of-fit of different models and kinship matrices in explaining phenotypic variation of maize quantitative traits
Comparison of the maximum likelihood (ML) and the Bayesian information criterion (BIC) of each model with different kinship matrices for maize quantitative traits. The model with the smaller BIC is preferred. Simple, the simple linear model without adjustment for population effect; SA, the model using the output from STRUCTURE as covariates; MM, the mixed model with different kinship matrices. The descriptions of kinship matrices are the same as in the Figure 1 legend.
List of plausible associations in the mouse association mapping