- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Xiong, M.
- Articles by Fang, X.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Xiong, M.
- Articles by Fang, X.
Identification of Genetic Networks
Momiao Xionga, Jun Lia, and Xiangzhong Fangaa Human Genetics Center, University of Texas, Houston Health Science Center, Houston, Texas 77030
Corresponding author: Momiao Xiong, University of Texas, 1200 Herman Pressler, Houston, TX 77225., mxiong{at}sph.uth.tmc.edu (E-mail)
Communicating editor: J. B. WALSH
| ABSTRACT |
|---|
In this report, we propose the use of structural equations as a tool for identifying and modeling genetic networks and genetic algorithms for searching the most likely genetic networks that best fit the data. After genetic networks are identified, it is fundamental to identify those networks influencing cell phenotypes. To accomplish this task we extend the concept of differential expression of the genes, widely used in gene expression data analysis, to genetic networks. We propose a definition for the differential expression of a genetic network and use the generalized T2 statistic to measure the ability of genetic networks to distinguish different phenotypes. However, describing the differential expression of genetic networks is not enough for understanding biological systems because differences in the expression of genetic networks do not directly reflect regulatory strength between gene activities. Therefore, in this report we also introduce the concept of differentially regulated genetic networks, which has the potential to assess changes of gene regulation in response to perturbation in the environment and may provide new insights into the mechanism of diseases and biological processes. We propose five novel statistics to measure the differences in regulation of genetic networks. To illustrate the concepts and methods for reconstruction of genetic networks and identification of association of genetic networks with function, we applied the proposed models and algorithms to three data sets.
RECENT advances in genome sequencing and high-throughput technologies, such as DNA and protein chips, allow us to measure the spatio-temporal expression levels of thousands of genes or proteins (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Although a great advance in both experimental technology and computational methods for reconstructing genetic networks has been made, we still face significant challenges in understanding such networks. To accomplish the goal of identifying genetic networks and to explore their applications to biomedical research, several issues must be addressed. First, the development of dynamic models of genetic networks is severely compromised by the lack of experimental techniques to measure the dynamic quantities of such networks. Therefore, revealing information from steady states of genetic networks using gene expression profiles is of great interest.
Second, to identify physically connected genetic networks using gene expression profiles, which describe how genes directly activate or inhibit others, may be too ambitious to be accomplished at the current stage due to incomplete information on the structure of genetic networks. However, instead of reconstructing physically connected genetic networks, it may be feasible to model quasi-genetic networks, defined as a network that describes most likely functional relations between the genes in the network. The quasi-genetic network may not represent physical connection of the genes in the network, but represents the best fit of the network model to gene expression data.
Third, many current computational methods for the reconstruction of genetic networks have focused on the network structure. However, structure provides only partial information on genetic networks. To measure quantitatively the relationship between genes in the network is indispensable for studying regulatory properties of genetic networks (![]()
To model quantitatively genetic networks, we propose the use of structural equations (![]()
![]()
![]()
![]()
![]()
Once a genetic network is identified, it is crucial to associate genetic networks with cell phenotypes. Differential expression of genes is a widely used concept for identifying genes that are able to discriminate cell phenotypes. To associate genetic networks with cell phenotypes, we generalize the notion of differentially expressed genetic networks and develop a statistic to test for the differential expression of such networks.
Coefficient parameters in the structural equations measure the regulatory effects of one gene on others or the strength of the gene-gene interactions. Functional mutations in the genes will often cause changes in regulatory effects. Thus, we expect that due to the accumulation of mutations in abnormal cells, the regulation of some genetic networks in abnormal cells will be significantly different from that in normal cells. Uncovering such differences may help us to identify the causes of disease. To accomplish this task, we provide five statistics to measure the differences in regulation between the genetic networks in normal and abnormal cells. We hope that by identifying differentially regulated genetic networks we are likely to discover a set of genes and genetic networks that influence the development of the diseases.
| METHODS |
|---|
Linear structural equation model:
Linear structural equations can be used for construction of a first-order approximation model of a genetic network using steady-state gene expression measurements (![]()
![]()
![]() |
(1) |
where R is a diagonal matrix and G(Z) is a vector of nonlinear functions. The right-hand side of Equation 1 has two terms: the first one is the production of molecules, and the second is the degradation of existing molecules. The system of nonlinear differential Equation 1 can be approximated to the first order by a linear system of equations near a steady state of the system

where Y is a vector of the deviation of variables in Z from their means and A is a Jacobian matrix of G(Z), i.e., A =
G(Z)/
Z, measuring the strength of regulatory interactions between genes in the network. When the system reaches a steady state, which is equivalent to setting the time derivative of Y to zero, we have

The above equations show that the Jacobian matrix involves feedback loops of a dynamic biological system and gene or protein expressions in cells or tissues are jointly or simultaneously determined. Gene expression data that are generated by biological systems must be described as a system of joint relations among the gene expression variables.
The naïve differential equation approach assumes that the genetic network is fully connected, ignoring the structural relations between genes in the network (![]()
![]()
![]()
We begin to describe structural equations for modeling genetic networks by introducing a path diagram (![]()
![]()
|
Variables in path diagrams can be classified into two basic types of variables, observed variables that can be measured and residual error variables that cannot be measured and represent all other unmodeled causes of the variables. Most observed variables (e.g., gene expression levels) are random. Some observed variables may be nonrandom or control variables (e.g., drug doses) whose values remain the same in repeated random sampling or might be manipulated by the experimenter. The observed variables will be further classified into exogenous variables, which lie outside the model, and endogenous variables, whose values are determined through joint interaction with other variables within the system. All nonrandom variables and some of the gene (or protein) expression data (e.g., initiators of pathway) can be viewed as exogenous variables. Most of the gene (or protein) expression data are viewed as endogenous variables. The terms exogenous and endogenous are model specific. It may be that an exogenous variable in one model is endogenous in another. The observed variables are enclosed in boxes and the error variables are not enclosed at all.
Let Y be a vector of the p endogenous variables and X be a vector of q exogenous variables. Occasionally, one or more of the X's are nonrandom. We denote the errors by e. We assume that E[e] = 0 and that e is uncorrelated with the exogenous variables in X. We also assume that ei is homoscedastic and nonautocorrelated (![]()
![]() |
(2) |
where B is a p x p matrix and
is a p x q matrix. The elements of the coefficient matrices B and
describe the regulatory effects of one gene on another or of a nonrandom variable on the gene, which is a direct regulatory influence of one variable on the other. Therefore, throughout the article, the matrices B and
are referred to as the regulatory matrices. Since the genetic networks are not fully connected, many elements in the matrices B and
will be zero. The matrices B and
are, in general, sparse. The matrix B can describe feedback relations in the path diagram. The structural equations can model directed cyclic graphs and hence genetic networks with feedback loops (![]()
In Fig 1 we assume that the expression levels of the genes CDC28, CLB1, and CLB3, denoted by x1, x2, and x3, respectively, are exogenous variables and the expression levels of the genes MCM1, MCM2, SWI4, CLN3, CDC47, and CDC6, denoted by y1, y2, y3, y4, y5, and y6, respectively, are endogenous variables. The structural equations for the genetic network are written as

We assume that the influence of the genes in the network is in one direction and that the errors in the equations are independent and uncorrelated with exogenous variables. Under these assumptions, if the genetic networks do not contain feedback loops, the B matrix can be made lower triangular by arranging the order of endogenous variables and the variance-covariance matrix of the errors is diagonal. Therefore, the structural equations for the genetic networks without feedback loops are recursive models, which ensure that parameters in the recursive model are identifiable (![]()
Parameter estimation:
To estimate the parameters of the structural equations, we assume that the structure of the network is known. How to identify network structure is discussed in the Model selection section. It is well documented that the ordinary least-squares estimator is biased and inconsistent for parameters in structural equations (![]()

where
is the population covariance matrix of the variables Y and X, and
(
) is the covariance matrix written as a function of the free model parameters in the models, which we denote by
. Let
and
denote the covariance matrices of X and e, respectively. The matrix
(
) consists of three parts: (1) the covariance matrix of Y, (2) the covariance matrix of X with Y, and (3) the covariance matrix of X. First we consider
YY(
), the implied covariance matrix of Y. From the Equation 2, we have Y = (I - B)-1(
X + e). Hence,
YY(
) = (I - B)-1(

' +
)(I - B)-1'. The implied covariance matrix of Y and X is given by

Therefore, we have

(![]()
,
, and
are estimated so that the implied covariance matrix
(
) is as close to the sample covariance matrix S, the estimator of the matrix
, as possible. To know when our estimates are as "close" as possible, we must define close, that is, we require a fitting function that is minimized. The most widely used fitting function is based on the method of maximum likelihood (ML) defined by maximizing the likelihood function or its log,

where p and q are the number of endogenous and exogenous variables, and Tr denotes the trace of a matrix. The fitting function FML compares the difference between the observed and predicted covariance matrices. In general, FML is a complicated nonlinear function of the structural parameters, and explicit solutions are not always found. Instead, a Newton unconstrained optimization procedure is employed to find solutions (![]()
It is well known that the ML estimators are consistent and asymptotically unbiased. Large sample theory ensures that (N - 1)FML is asymptotically distributed as
2 distribution with 1/2(p + q)(p + q + 1) - t d.f., where t is the number of free parameters, and the distribution of the estimator is asymptotically normal. Hence, the ratio of the estimated parameter to its standard error approximates a Z-distribution for large samples and can be used to test the parameters. The standard errors can be obtained from the following asymptotical covariance matrix for the ML estimators,

where N is the number of samples.
Model selection:
Learning about genetic networks consists of two parts: parameter learning and structure learning. For parameter learning, in the previous section we assume that the network structure is known. However, in most cases, the network structure is unknown and needs to be identified. To learn network structure from genome-wide gene expression profiles consists of two steps. The first step is to select the set of genes whose reconstructed network best fits the gene expression data. The second step is to learn the structure of the networks for a set of selected genes, which provides the best fit to the gene expression data.
To identify the structure of the network, an overall model fit measure is needed to assess how well a genetic network fits the data and to compare the merits of alternative network structure (![]()
![]()
![]()

where N is the number of samples, FML is the fitting function, d = 1/2(p + q)(p + q + 1) - t is degrees of freedom, and t is the number of free parameters in the model. The AIC value provides a relative ordering of different models fitting the data. The smaller the AIC value, the better the model fits the data.
However, AIC information cannot be employed to test whether the identified genetic network is valid. Fortunately, the statistic (N - 1)FML is asymptotically distributed as a
2(d) distribution under the null hypothesis H0:
=
(
). It should be noted that the null hypothesis means that the constraints on
imposed by the genetic network model are valid. In contrast to ordinary tests where the probability of obtaining a
2 value larger than a prespecified value is the probability of committing error for the rejection of the null hypothesis, in the model selection test here, the probability of obtaining a
2 value larger than a prespecified value is the probability of ensuring that the fitted model is correct and is referred to as the fitting probability. Therefore, the higher the probability of the
2, the closer is the fitted model for the genetic network to the true genetic network.
Genetic algorithms:
Searching the genetic network is a very difficult problem because of the large number of possible networks. To exhaustively search all possible networks is infeasible, in practice, even with high-performance computers. Genetic algorithms (GAs) can be used for searching networks (![]()
We use a k x k connective matrix C to represent the structure of a network with k genes. The elements of C are given by

GAs begin with a population that consists of a large number of individuals. In our genetic algorithm, individuals of the population represent selected genes and network structures. This type of individual is denoted by a string,

which is usually referred to as a chromosome in the GA literature (as opposed to a real chromosome). The first part of the chromosome g1g2 ... gk is a set of integer numbers representing genes selected in the network. The second part c11c21 ... ck1 ... c1kc2k ... ckk is a binary string indicating the network structure. GAs attempt to find individuals from the search space with the best fitness (e.g., smallest AIC value). The searching procedure of GAs can be briefly described as follows. First, the initial population is generated randomly, and the fitness of each individual is calculated. Second, individuals with good fitness are selected as parents. These parents produce children by the operations of crossover and mutation. A crossover operation in a GA algorithm produces two children by an exchange of chromosome segments between two parents. The mutation operation creates children by changing parents' chromosomes. All new produced children are added to the population. Some individuals with worse fitness (e.g., higher AIC values) are removed from the extended population (including both parents and children) to generate a new population with its initial size, but with better fitness. Crossover and mutation play different roles in the genetic algorithm. Crossover increases the average fitness of the population. Mutation can help the algorithm to avoid local optima by exploring new states. After many iterations of GAs most likely or near most likely networks to fit the data can be found. When the difference between AIC values of two successive iterations is less than a prespecified threshold, the iteration of GAs is stopped.
The generalized T2 statistic for testing the differential expression of genetic networks:
Let
1 and
2 be the mean value of expression of all the genes in the network from normal and abnormal tissues, respectively. Let Spool be the pooled estimate of common covariance matrix between gene expressions. It can be shown that

(![]()

n1 and n2 are the sample sizes of normal and abnormal tissues, respectively, and p is the number of genes selected in the test statistic. Consequently, T2 can be used to test whether the population means, µ1 and µ2, differ significantly and to test for the significance of separation of two populations (normal and abnormal tissues). Formally, the null hypothesis H0: µ1 = µ2 vs. the alternative hypothesis Ha: µ1
µ2 is assumed. If H0 is rejected on the basis of a T2 test, we can conclude that the separation between normal and abnormal tissue populations is significant and the genetic network is differentially expressed.
Index for measuring difference in regulation of genetic networks:
Let A = [B
] be a coefficient matrix of structural equations for modeling a genetic network. Let A1 and A2 be its corresponding coefficient matrices in the normal and abnormal tissue samples. Let W = A1 - A2 and wij be an element of the matrix W. Since wij is a parameter in the network, its asymptotic standard deviation can be calculated from the square root of the main diagonal of the asymptotic covariance matrix of the estimated parameters in the network and denoted by SWij. We define the test statistic TG as

Although the exact distribution of TG is unknown, its asymptotical distribution can be approximated by a t distribution with N - 2 d.f. This statistic can be used to test the difference of the regulatory effect of one gene on another between normal and abnormal tissues.
The difference of the regulatory effect of one gene on another cannot measure the difference in the global behavior of the genetic networks between normal and abnormal tissues. A simple quantity to measure the difference in global behavior of genetic networks between the normal and abnormal tissues is the largest absolute value of the difference of the regulatory effect of one gene on another in the network between the normal and abnormal tissues, i.e.,
. The statistic TG for testing the difference of individual regulatory effect can be used to test the difference in global behavior of genetic networks. Specifically, the statistic for testing the differential regulation of the genetic networks is given by

The P value is calculated by a permutation test. The gene expression profile matrix is randomly permuted, and the structural equation model and genetic algorithms are applied to randomly permutated gene expression data to reconstruct the genetic network hundreds or thousands of times. Then, we calculate TG0 and obtain an empirical distribution of TG0. The P value of the test is then defined as the probability that TG0 exceeds its observed value. The statistic TG0 can be used to measure the difference in regulation of the genetic network.
The difference in global behavior of genetic networks between the normal and abnormal tissues depends on the whole regulatory coefficient matrix. A scalar associated with a matrix W is a norm of the matrix W that denotes a real valued function of W (of the elements wij of W). The norm is relevant with all elements of the matrix and hence can be used to measure the difference in regulation of the whole genetic networks. Four metrics borrowed from the norms of the matrix for measuring the difference in regulation of the genetic networks are defined as follows (![]()
= maximum of sums of absolute value of column elements of the matrix.
= maximum of sums of absolute values of row elements of the matrix. - ||W||2 = square root of the maximum eigenvalues of the matrix WTW, a spectral norm.
- ||W||E = [
i
j (|wij|)2]1/2, a Euclidean norm.
| RESULTS |
|---|
Illustration of structural equations for modeling genetic networks:
To illustrate the use of structural equations for modeling genetic networks, we first analyze the expression profiles of 6220 genes using oligonucleotide arrays in synchronized yeast cells during the cell cycle (![]()
The genetic network shown in Fig 1 was reconstructed by applying the proposed structural equation model to the expression profiles of the genes CDC28, CLB1, CLB3, MCM1, MCM2, SWI4, CLN3, CDC47, and CDC6, which play an important role in the M/G1 phase of the cell cycle. The regulatory relations between the genes in the network can be confirmed by the experiments (![]()
![]()
![]()
![]()
A common approach for identifying networks is to use model selection to choose those networks with high-scoring models and we use both AIC values and fitting probability to score the model. AIC values, which have a close relationship with the likelihood function, are widely used model selection criteria. However, AIC values measure only the relative goodness of fit. On the other hand, the fitting probability quantifies how well the model explains the observed data. Therefore, we use AIC values to select the model, but we also report the fitting probability of the selected models to indicate how reliable the selected models are.
The gene SWI4 plays an important role in cell cycle progression (![]()
![]()
![]()
![]()
![]()
![]()
|
To investigate the effect of removing a gene from the genetic network, we plotted Fig 3, in which the gene SWI4 was removed from the genetic network shown in Fig 2. It was interesting to note that most of the regulatory effects in the genetic network were not changed except for the regulatory effect of YGL239C on CHA1. This had an important implication: removing a gene will influence only the effects of the genes that were directly connected with the removed gene, but it did not have a significant impact on other parts of the genetic network.
|
As the number of genes in genome-wide gene expression profiles increases, the total number of all possible genetic networks exponentially increases. This number of possible genetic networks is too large to be exhaustively searched. There are two approaches to treat this problem. One approach is an ensemble method for identifying genetic networks that are consistent with existing gene expression profiling data (![]()
![]()
|
|
The largest fitting probability that we can reach after a number of iterations is a function of the number of genes in the network. The fitting probability will decrease when the number of genes in the network increases. To demonstrate this, we first fix the number of genes in the network and run 100 genetic algorithms to search for networks with the fixed number of genes. In this way, for each fixed number of genes in the network we can obtain the largest fitting probability. We can see from Fig 6 that when the number of genes in the network was >14 the fitting probability became small, which implied that the genetic network did not fit the data well. The size of the genetic network (i.e., the number of genes in the network) is limited by the number of tissue samples.
|
To further evaluate the performance of the proposed model for reconstructing genetic networks, we take 85 regulators of yeast listed in ![]()
![]()
![]()
![]()
Differentially expressed genetic networks:
Differentially expressed genetic networks are a property of the network as a whole. The differential expression of the genetic network may be due to the differential expression of some individual genes in the network or other factors such as gene-gene interaction. To show that highly differentially expressed genetic networks may contain highly differentially expressed genes, we analyzed the expression profiles of 5483 genes using oligonucleotide arrays in 74 multiple-myeloma (MM) tissue samples and 31 normal tissue samples (![]()
![]()
|
The differential expression of the genetic network may be largely due to the differential expression of some genes in the network. However, this need not always be the case. For example, it is possible that all genes in a genetic network are not highly differentially expressed, but the network as a whole is highly differentially expressed. To show this, we analyzed the expression profiles for 12,531 genes using an Affymatrix oligonucleotide array in 50 normal and 52 tumor prostate tissues (![]()
![]()
![]()
![]()
|
Differentially regulated genetic networks:
Identification of differentially regulated genetic networks consists of three steps. First, we reconstruct genetic networks using structural equations and gene expression data in all available samples. Second, we fix the structure of the genetic networks and then estimate network parameters by using gene expression data of normal and abnormal samples. Third, we rank the genetic networks according to some statistics, which measure the extent of the difference in regulatory effects of the genetic networks between normal and abnormal tissue samples.
There are three important cases: (i) the genetic network is differentially regulated but not differentially expressed; (ii) the genetic network is differentially expressed but not differentially regulated; or (iii) the genetic network is both differentially regulated and expressed. We first use the largest difference of the gene regulatory effect in the network between normal and abnormal samples as a measure to quantify the difference in regulation of the network. Then we compare all five measures. The most differentially regulated genetic network for the MM data set that had an AIC value of -65.45 and a fitting probability of 1 is plotted in Fig 9. The network with 10 genes was partitioned into two subnetworks: one subnetwork with 8 genes and one subnetwork with 2 genes. The largest difference of the gene regulatory effect was 2.7953 (
, P value = 0.00062), which was associated with the regulation of the gene DF (D component of complement adipsin) on AX1, where the P value was obtained by a permutation test. From Fig 9, we could also observe that other regulatory effects in the network for the tumor and normal samples were not significantly different. It was interesting to note that the gene DF (P value = 6.77 x 10-9) and the receptor AX1 (P value= 4.78 x 10-10) as well as the network (P value = 3.87 x 10-14) were differentially expressed. The expression of the genes AX1 and DF and the fitted structural equation line of the expression of gene DF as a function of the expression of the gene AX1 in tumor and normal samples are shown in Fig 10. The slope of the line represented the regulatory effect of the gene DF on the gene AX1. We could clearly see the different regulatory effects in the tumor and normal samples from Fig 10. It was reported that the gene DF was a novel serine protease (![]()
![]()
![]()
|
|
The secondmost differentially regulated genetic network for the MM data set with 10 genes that had an AIC value of -63.17 and a fitting probability of 0.9999 is shown in Fig 11. Again, the network was partitioned into two subnetworks: one subnetwork with 4 genes and one subnetwork with 6 genes. The largest difference in the regulatory effect was 2.523 (
, P value = 0.001), which was associated with the regulation of the gene ABCA2 on the gene GABA-A. It was interesting that the P value for testing the differential expression of this subnetwork (with 6 genes) was equal to P = 0.2672. Also we can see from Fig 11 that neither ABCA2 nor GABA-A was differentially expressed. This demonstrated that differentially regulated genetic networks may not be differentially expressed. Expression of ABCA2 and GABA-A and the fitted structural equation line of GABA-A expression as a function of the expression of ABCA2 in tumor and normal samples are shown in Fig 12. We can see from Fig 12 that tumor and normal samples cannot be separated by the expression of GABA-A and ABCA2; however, the slope of the structural equation lines of GABA-A on ABCA2 in tumor and normal samples can be significantly different. It was reported that ABCA2 was a regulator of neural transmembrane lipid transport (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
|
The most significantly differentially expressed genetic network for the prostate data set, which is shown in Fig 13, had an AIC value of -51.56, a fitting probability of 0.9827, and a P value for testing the significance of differential expression of 2.47 x 10-12. The gene SLC25A6 (P value = 1.96 x 10-11) and the gene ANGPT1 (P value = 1.91 x 10-9) in the network showed significant evidence of differential expressions. However, the rank of the genetic network in the differentially regulated genetic network for the prostate data was 122. The largest difference in the gene regulatory effects of the network was 0.877 (TG = 8.5253, P value = 0.15). This demonstrated that although this genetic network was highly differentially expressed, it was not differentially regulated.
|
To compare the five metrics for characterizing the difference in regulation of the genetic networks under different conditions, we present Table 1, which shows the correlation coefficients between rankings of the genetic networks made by the five metrics. We can see that the correlation coefficients between rankings of the five metrics were very high. This suggested that the five metrics can provide similar evidence showing differential regulation of the genetic networks in normal and abnormal tissues.
|
| DISCUSSION |
|---|
Genetic networks have two aspects: structure of the networks and strength of the interaction between the genes in the networks. To understand comprehensively genetic networks, in addition to studying the nature of structure, we also need to quantify the strength of the interaction between the genes. Due to the large variation in observed gene expression profiles, quantitative models for genetic networks may not be accurate, but they will still be a useful tool for guiding experiments and understanding complex biological systems, particularly when advances in experimental technologies are made and the precision of experimental data is improved.
Regulation of genetic networks has a cause-effect feature. Causal inference may provide an ideal conceptual framework for reconstruction of genetic networks. In the past decades, several causal inference tools have been developed. ![]()
Identification of genetic networks consists of two steps: parameter estimation and structure discovery. In the first step, we assume that the structure of the network is known. A remarkable feature of the regulatory relation among genes in the network is that the expression levels of the genes are determined by the simultaneous interaction of the regulatory relations in the network. Using ordinary regression and the least-squares method for estimation of the parameters will result in inconsistent estimates of the parameters in the network. The proposed structural equation models and estimation procedures based on covariance analysis can avoid this problem and lead to consistent estimates of the parameters in the networks.
The second step is to identify the structure of the networks when it is unknown. The genetic networks that best fit the data may not be truly physically connected, but can reveal causal relations between variables in the network and predict the behavior of biological systems. We used model selection to accomplish this task. Since searching optimal models from an extremely large number of potential networks is computationally expensive, we proposed using genetic algorithms to search the most likely genetic networks fitting the data.
Structure discovery is, essentially, to identify causal relations between variables in the model. The definition of cause has three crucial components: isolation, association, and direction of influence (![]()
When genetic networks are reconstructed, either from experiments or from computational modeling, it is essential to link genetic networks with cell function. It has been noted that the function of complex systems is accomplished through networks (![]()
Knowing differential expressions of the genetic networks is not enough for understanding complex biological systems. Differences in gene expressions do not directly reflect strength between gene activities. The differentially expressed genes may not be the cause of diseases. To overcome this problem, we proposed the concept of differentially regulated genetic networks. Therefore, to investigate how the activities of the genetic networks influence the phenotypes of cells, we need not only to discover the structure of genetic networks, which specify the interaction of genes in the networks, but also to quantify the strength of the interaction and to measure the effect of induction and repression of target genes on other genes. Identification of differentially regulated genetic networks may help us to discover the cause of diseases. We proposed five statistics to measure the differences in regulation of the genetic networks, some of which quantify the differences in regulatory effects of the individual pair of the regulator and regulated genes, while others take differences in regulatory effects of all genes in the network into account. We showed that in one important application these five measures will identify a common set of differentially regulated genetic networks.
The expression level of each gene is a nonlinear function of its regulatory input. Linear tools for reconstruction of genetic networks can only approximately model the response of the gene to the regulatory input. Therefore, in some cases, the identified genetic networks based on the linear structural equations may not represent the true biological systems. To overcome this problem, we need to develop nonlinear structural equation models for genetic networks in the future.
| ACKNOWLEDGMENTS |
|---|
The authors thank Joshua M. Akey for his helpful comments on this article, which helped to improve its presentation. We also thank the associate editor, Bruce Walsh, and two anonymous reviewers for helpful comments on the manuscript, which led to much improvement of the article. M. M. Xiong and L. Jun are supported by National Institutes of Health-National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIH-NIAMS) grant IP50AR44888 and NIH grant ES09912.
Manuscript received February 12, 2003; Accepted for publication October 15, 2003.
| LITERATURE CITED |
|---|
AKUTSU, T., S. MIYANO, and S. KUHARA, 2000 Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. J. Comput. Biol. 7:331-343.[CrossRef][Medline]
ANDERSON, T. W., 1984 An Introduction to Multivariate Statistical Analysis, Ed. 2. John Wiley & Sons, New York.
ARLUISON, V., G. BATELIER, M. RIES-KAUTT, and H. GROSJEAN, 1999 RNA:pseudouridine synthetase Pus1 from Saccharomyces cerevisiae: oligomerization property and stoichiometry of the complex with yeast tRNA(Phe). Biochimie 81:751-756.[Medline]
ARNOLD, J., H.-B. SCHUTTLER, D. LOGAN, J. GRIFFITH, B. ARPINAR et al., 2004 Metabolomics, in Handbook of Industrial Mycology. Marcel Dekker, New York (in press).
BATTOGTOKH, D., D. K. ASCH, M. E. CASE, J. ARNOLD, and H.-B. SCHUTTLER, 2002 An ensemble method for identifying regulatory circuits with special reference to the qa gene cluster of Neurospora crassa. Proc. Natl. Acad. Sci. USA 99:16904-16909.
BERTSEKAS, D. P., 1995 Nonlinear Programming. Athena Scientific, Belmont, MA.
BOLLEN, K. A., 1989 Structural Equations With Latent Variables. John Wiley & Sons, New York.
BORNAES, C., J. G. PETERSEN, and S. HOLMBERG, 1992 Serine and threonine catabolism in Saccharomyces cerevisiae: the CHA1 polypeptide is homologous with other serine and threonine dehydratases. Genetics 131:531-539.[Abstract]
BOWSER, D. N., D. A. WAGNER, C. CZAJKOWSKI, B. A. CROMER, and M. W. PARKER et al., 2002 Altered kinetics and benzodiazepine sensitivity of a GABAA receptor subunit mutation [gamma 2(R43Q)] found in human epilepsy. Proc. Natl. Acad. Sci. USA 99:15170-15175.
BROWN, P. O. and D. BOTSTEIN, 1999 Exploring the new world of the genome with DNA microarrays. Nat. Genet. 21:33-37.[CrossRef][Medline]
CHEN, T., H. L. HE, and G. M. CHURCH, 1999 Modeling gene expression with differential equations. Pac. Symp. Biocomput. 4:29-40.
CHEN, Z., J. R. GORDON, X. ZHANG, and J. XIANG, 2002 Analysis of the gene expression profiles of immature versus mature bone marrow-derived dendritic cells using DNA arrays. Biochem. Biophys. Res. Commun. 290:66-72.[CrossRef][Medline]
CHO, R. J., M. J. CAMPBELL, E. A. WINZELER, L. STEINMETZ, and A. CONWAY et al., 1998 A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2:65-73.[CrossRef][Medline]
DATTA, S., 2001 Exploring relationships: a partial least square approach. Gene Exp. 9:257-264.
D'HAESELEER, P., X. WEN, S. FUHRMAN, and R. SOMOGYI, 1999 Linear modeling of mRNA expression levels during CNS development and injury. Pac. Symp. Biocomput. 4:41-52.
DOTAN, Z. A., T. LITMANOVITCH, Y. RAVIA, N. ONIASHVILI, and L. LEIBOVITCH et al., 2000 Modification in the inherent mode of allelic replication in lymphocytes of patients suffering from renal cell carcinoma: a novel genetic alteration associated with malignancy. Genes Chromosomes Cancer 27:270-277.[CrossRef][Medline]
DUNCAN, O. D., 1975 Introduction to Structural Equation Models. Academic Press, New York.
FIGEYS, D. and D. PINTO, 2001 Proteomics on a chip: promising developments. Electrophoresis 22:208-216.[CrossRef][Medline]
FIUCCI, G., D. RAVID, R. REICH, and M. LISCOVOTCH, 2002 Caveolin-1 inhibits anchorage-independent growth, anoikis and invasiveness in MCF-7 human breast cancer cells. Oncogene 21:2365-2375.[CrossRef][Medline]
FRIEDMAN, N., M. LINIAL, L. NACHMAN, and D. PE'ERÉ, 2000 Using Bayesian networks to analyze expression data. J. Comput. Biol. 7:601-620.[CrossRef][Medline]
GARDNER, T. S., D. DI BERNARDO, D. LORENZ, and J. J. COLLINS, 2003 Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301:102-105.
GARIB, V., B. NIGGEMANN, K. S. ZANKER, L. BRANDT, and B. S. KUBENS, 2002 Influence of non-volatile anesthetics on the migration behavior of the human breast cancer cell line MDA-MB-468. Acta Anaesthesiol. Scand. 46:836-844.[CrossRef][Medline]
GRAYBILL, A. A., 1976 Matrices With Applications in Statistics, Ed. 2. Wadsworth International Group, Belmont, CA.
HAAVELMO, T., 1943 The statistical implications of a system of simultaneous equations. Econometrica 11:1-12.[CrossRef]
HARTEMINK, A. J., D. K. GIFFORD, T. S. JAAKKOLA and R. A. YOUNG, 2001 Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Pac. Symp. Biocomput., 422433.
HASTY, J., D. MCMILLEN, and J. J. COLLINS, 2002 Engineered gene circuits. Nature 420:224-230.[CrossRef][Medline]
HOUSEMAN, B. T., J. H. HUH, S. J. KRON, and M. MRKSICH, 2002 Peptide chips for the quantitative evaluation of protein kinase activity. Nat. Biotechnol. 20:270-274.[CrossRef][Medline]
HUGHES, T. R. and D. D. SHOEMAKER, 2001 DNA microarrays for expression profiling. Curr. Opin. Chem. Biol. 5:21-25.[CrossRef][Medline]
IDEKER, T., V. THORSSON, J. A. RANISH, R. CHRISTMAS, and J. BUHLER et al., 2001 Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292:929-934.
IMOTO, S., T. GOTO and S. MIYANO, 2002 Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac. Symp. Biocomput., 175186.
IWAMA, H. and T. GOJOBORI, 2002 Identification of neurotransmitter receptor genes under significantly relaxed selective constraint by orthologous gene comparisons between humans and rodents. Mol. Biol. Evol. 19:1891-1901.
JONG, H. D., 2002 Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol. 9:67-103.[CrossRef][Medline]
JORDAN, M. I., 1999 Learning in Graphical Models. MIT Press, Cambridge, MA.
JOSEPH, J., B. NIGGEMANN, K. S. ZAENKER, and F. ENTSCHLADEN, 2002 The neurotransmitter gamma-aminobutyric acid is an inhibitory regulator for the migration of SW 480 colon carcinoma cells. Cancer Res. 62:6467-6469.
KOCH, C. and K. NASMYTH, 1994 Cell cycle regulated transcription in yeast. Curr. Opin. Cell Biol. 6:451-459.[CrossRef][Medline]
LARRANAGA, P., M. POZA, Y. YURRAMENDI, R. H. MARGA, and C. M. H. KUIJPERS, 1996 Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters. IEEE Trans. Patt. Anal. Mach. Intell. 18:912-926.[CrossRef]
LEE, H., D. S. PARK, B. RAZANI, R. G. RUSSELL, and R. G. PESTELL et al., 2002 Caveolin-1 mutations (P132L and null) and the pathogenesis of breast cancer: caveolin-1 (P132L) behaves in a dominant-negative manner and caveolin-1 (-/-) null mice show mammary epithelial cell hyperplasia. Am. J. Pathol. 161:1357-1369.
LEE, T. I., N. J. RINALDI, F. ROBERT, D. T. ODOM, and Z. BARJOSEPH et al., 2002 Transcriptional regulatory networks in Saccharomyces cerevisiae.. Science 298:799-804.
LENNON, K., R. PRETEL, J. KESSELHEIM, S. HEESEN, and M. A. KUKURUZINSKA, 1995 Proliferation-dependent differential regulation of the dolichol pathway genes in Saccharomyces cerevisiae. Glycobiology 5:633-642.
LIANG, S., S. FUHRMAN, and R. SOMOGYI, 1998 Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac. Symp. Biocomput. 3:18-29.
LIPSCHUTZ, R. J., S. P. A. FODOR, T. R. GINGERAS, and D. J. LOCKHARDT, 1999 High density synthetic oligonucleotide arrays. Nat. Genet. 21(Suppl.):20-24.[CrossRef][Medline]
LOCKHART, D. J. and E. A. WINZELER, 2000 Genomics, gene expression and DNA arrays. Nature 405:827-836.[CrossRef][Medline]
MANN, M., 1999 Quantitative proteomics. Nat. Biotechnol. 17:954-955.[CrossRef][Medline]
MARUYAMA, G. M., 1998 Basics of Structural Equation Modeling. SAGE Publications, Thousand Oaks, CA.
MCINERNY, C. J., J. F. PARTRIDGE, G. E. MIKESELL, D. P. CREEMER, and L. L. BREEDEN, 1997 A novel Mcm1-dependent element in the SWI4, CLN3, CDC6, and CDC47 promoters activates M/G1-specific transcription. Genes Dev. 11:1277-1288.
MCLUCKEY, S. A. and J. M. WELLS, 2001 Mass analysis at the advent of the 21st century. Chem. Rev. 101:571-606.[CrossRef][Medline]
O'CONNOR, J. P. and C. L. PEEBLES, 1992 PTA1, an essential gene of Saccharomyces cerevisiae affecting pre-tRNA processing. Mol. Cell. Biol. 12:3843-3856.
PAPADIMITRIOU, G. N., D. G. DIKEOS, G. KARADIMA, D. AARAMOPOULOS, and E. G. DASKALOPOULOU et al., 2001 GABA-A receptor beta3 and alpha5 subunit gene cluster on chromosome 15q11-q13 and bipolar disorder: a genetic association study. Am. J. Med. Genet. 105:317-320.[CrossRef][Medline]
PEARL, J., 2000 Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, UK/London/New York.
PEEPER, D. S., A. SHVARTS, T. BRUMMELKAMP, S. DOUMA, and E. Y. KOH et al., 2002 A functional screen identifies hDRIL1 as an oncogene that rescues RAS-induced senescence. Nat. Cell Biol. 4:148-153.[CrossRef][Medline]
RONEN, M., R. ROSENBERG, B. L. SHRAIMAN, and U. ALON, 2002 Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc. Natl. Acad. Sci. USA 99:10555-10560.
SCHMITZ, G. and W. E. KAMINSKI, 2002 ABCA2: a candidate regulator of neural transmembrane lipid transport. Cell. Mol. Life Sci. 59:1285-1295.[CrossRef][Medline]
SHIPLEY, B., 2000 Cause and Correlation in Biology: A User's Guide to Path Analysis, Structural Equations and Causal Inference. Cambridge University Press, Cambridge, UK/London/New York.
SHMULEVICH, I., E. R. DOUGHERTY, and W. ZHANG, 2002 Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics 18:1319-1331.
SINGH, D., P. G. FEBBO, K. ROSS, D. G. JACKSON, and J. MANOLA et al., 2002 Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203-209.[CrossRef][Medline]
VOLANAKIS, J. E. and S. V. NARAYANA, 1996 Complement factor D, a novel serine protease. Protein Sci. 5:553-564.[Medline]
VON DASSOW, G., E. MEIR, E. M. MUNRO, and G. M. ODELL, 2000 The segment polarity network is a robust developmental module. Nature 406:188-192.[CrossRef][Medline]
WAHDE, M. and J. HERTZ, 2000 Coarse-grained reverse engineering of genetic regulatory networks. Biosystems 55:129-136.[CrossRef][Medline]
WIECHEN, K., L. DIATCHENKO, A. AGOULNIK, K. M. SCHARFF, and H. SCHOBER et al., 2001 Caveolin-1 is down-regulated in human ovarian carcinoma and acts as a candidate tumor suppressor gene. Am. J. Pathol. 159:1635-1643.
WONG, E. T., D. E. JENNE, M. ZIMMER, S. D. PORTER, and C. B. GILKS, 1999 Changes in chromatin organization at the neutrophil elastase locus associated with myeloid cell differentiation. Blood 94:3730-3736.
WOOLF, P. J. and Y. WANG, 2000 A fuzzy logic approach to analyzing gene expression data. Physiol. Genomics 3:9-15.
WRIGHT, S., 1921 Correlation and causation. J. Agric. Res. 10:557-585.
YOUNG, R. A., 2000 Biomedical discovery with DNA arrays. Cell 102:9-15.[CrossRef][Medline]
ZHAN, F., J. HARDIN, B. KORDSMEIER, K. BUMM, and M. ZHENG et al., 2002 Global gene expression profiling of multiple myeloma, monoclonal gammopathy of undetermined significance, and normal bone marrow plasma cells. Blood 99:1745-1757.
ZHANG, M. Q., 1999 Large-scale gene expression data analysis: a new challenge to computational biologists. Genome Res. 9:681-688.
ZHANG, M., Y. GONG, N. ASSY, and G. Y. MINUK, 2000 Increased GABAergic activity inhibits alpha-fetoprotein mRNA expression and the proliferative activity of the HepG2 human hepatocellular carcinoma cell line. J. Hepatol. 32:85-91.[Medline]
ZHOU, C. J., N. INAGAKI, S. J. PLEASURE, L. X. ZHAO, and S. KIKUYAMA et al., 2002 ATP-binding cassette transporter ABCA2 (ABC2) expression in the developing spinal cord and PNS during myelination. J. Comp. Neurol. 451:334-345.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
B. Liu, A. de la Fuente, and I. Hoeschele Gene Network Inference via Structural Equation Modeling in Genetical Genomics Experiments Genetics, March 1, 2008; 178(3): 1763 - 1776. [Abstract] [Full Text] [PDF] |
||||
![]() |
R.-S. Wang, Y. Wang, X.-S. Zhang, and L. Chen Inferring transcriptional regulatory networks from high-throughput data Bioinformatics, November 15, 2007; 23(22): 3056 - 3064. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Varona, D. Sorensen, and R. Thompson Analysis of Litter Size and Average Litter Weight in Pigs Using a Recursive Model Genetics, November 1, 2007; 177(3): 1791 - 1799. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Fujita, J.R. Sato, H.M. Garay-Malpartida, P.A. Morettin, M.C. Sogayar, and C.E. Ferreira Time-varying modeling of gene expression regulatory networks using the wavelet dynamic vector autoregressive method Bioinformatics, July 1, 2007; 23(13): 1623 - 1630. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Xiong Non-linear tests for identifying differentially expressed genes or genetic networks Bioinformatics, April 15, 2006; 22(8): 919 - 923. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Xiong, M.
- Articles by Fang, X.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Xiong, M.
- Articles by Fang, X.
















