TABLE 1

Fitting of the empirical frequency distributions of the gene expression levels

Methods and librariesLibrary size, MNo. of distinct tags, NM/Np1JJ/Mk±SEb±SEΨ
SAGE (yeast)
Log-phase20,0965,3243.780.666360.0320.970.004−0.1730.0049.3
S-phase19,8715,7853.440.675610.0280.980.004−0.1970.0049.8
G2/M-phase19,5275,3033.680.675190.0270.960.006−0.1950.0068.8
Pooled library59,49411,3295.250.621,7160.0290.940.008−0.1080.0087.7
Total true tags47,3935,8198.140.461,7160.0360.970.0010.4940.0017.7
SAGE (mouse)
1901843,27417,7542.430.726300.0151.190.001−0.1650.0018.6
2042761,24024,7962.460.734250.0071.140.001−0.1950.0018.0
Human
15481,51619,1374.260.531,5980.021.250.0120.570.0167.1
14461,24517,3233.530.565210.0091.390.0050.620.0069.8
14351,94913,5893.820.563700.0071.270.0060.490.0079.5
15351,90616,2573.190.596590.0131.390.0080.480.0098.9
16149,33415,1823.250.598320.0171.440.0100.570.0078.3
12245,91115,2433.010.614500.0101.370.0230.400.0247.1
16042,97813,3943.210.593380.0081.360.0140.440.0158.2
14637,51213,0332.880.643700.0101.380.0280.300.0279.3
14527,2299,4522.880.643260.0121.300.0090.300.0099.3
12326,66910,1822.620.653230.0121.420.0100.240.0099.3
2892.222,6379,3482.420.742210.0101.080.030−0.280.01011
17120,0507,7022.60.724400.0221.400.034−0.0270.0258.3
16716,3615,9002.770.695610.0341.270.063009.1
16614,6165,3832.720.74620.0321.280.0100.0150.01510
1728,9364,5071.980.762100.0241.360.026−0.1440.0178.5
154.18,9364,5901.950.761810.0301.360.023−0.1230.0139.2
2892.16,3133,5311.790.81780.0121.050.015−0.4800.00810
16982,8611,9611.460.83190.0071.090.110−0.5000.0588.3
cDNA (LifeTech)
Mouse, Lib. 34136,6758,0194.570.421,6410.0451.440.0100.900.065
Mouse, Lib. 94612,3094,0233.060.564270.0351.490.0010.750.0037.8
Human, Lib. 242710,0873,5862.810.542460.0291.880.0401.340.057.1
Affymetrix arrays (yeast)
Log-phase16,7623,0005.590.461440.0090.860.0010.370.0037.4
G1-phase17,4082,8626.080.451780.0100.850.0010.360.0046.7
S-phase16,4402,9035.660.471510.0090.850.0010.320.0047.0
G2/M-phase17,0362,9005.870.451560.0090.840.0010.360.0046.9
  • Characterization of the empirical frequency distributions of the gene expression levels for yeast, mouse, and human cell-type libraries and goodness-of-fit analysis using the generalized discrete Pareto (GDP) model. M is the number of tags (a size of the library); N is the number of distinct tags. p1 is the fraction of distinct tags represented by one copy in the library. J is the maximum observed gene expression level in the library. k and b are the parameters of the GDP model. Ψ is the goodness-of-fit criterion (see methods). Ψ ranges between excellent (11–8), very good (8–6), and satisfactory (6–4). Yeast SAGE libraries: cells on G2/M-, S-, and log-phase stages of cell life; a pool of these three libraries; and a true tags library. Mouse SAGE libraries (Unilib IDs): 19018 (brain, meduloblastoma), 20427 (brain, normal, purified granular cell precursors). Human SAGE libraries (Unilib IDs): 154 (normal brain cells, >95% white matter), 144 [HJJJO, glioblastoma (GBM)], 143 [H392, GBM cell line (CL)], 153 (pooled GBMs), 161 (pooled normal brain), 122 (HCTT116, colon cancer CL), 160 (NHA, normal astrocyte CL), 146 (RKO, colon cancer CL), 145 (SW837, colon cancer CL), 123 (Caco2, colon cancer CL), 2892.2 (LNCaP, the prostate cancer CL library 2892 after J year), J7J (primary colon cancer), 167 (normal colon), 166 (normal colon), 172 (primary colon cancer), 154.1 (normal brain tissue, sublibrary taken from library 154), 2892.1 (LNCaP, initial library 2892), and 1698 (ovary carcinoma). cDNA libraries (Life Technologies method): mouse mammary cell library 34J, mouse normal kidney library 496, and human choriocarcinoma cell library 2427. Affymetrix microarrays: normal yeast cells on log-, GJ-, S-, and G2/M-phases of cell life (data from database http://www.hsph.harvard/geneexpression).