- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Supplemental data
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Bhandarkar, S. M.
- Articles by Kota, R. N.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Bhandarkar, S. M.
- Articles by Kota, R. N.
Parallel Computation of a Maximum-Likelihood Estimator of a Physical Map
Suchendra M. Bhandarkara, Salem A. Machakaa, Sanjay S. Sheteb, and Raghuram N. Kotaaa Department of Computer Science, The University of Georgia, Athens, Georgia 30602-7404
b Department of Statistics, The University of Georgia, Athens, Georgia 30602-7404
Corresponding author: Suchendra M. Bhandarkar, Department of Computer Science, The University of Georgia, 415 Boyd Graduate Studies Research Ctr., Athens, GA 30602-7404., suchi{at}cs.uga.edu (E-mail)
Communicating editor: J. ARNOLD
| ABSTRACT |
|---|
Reconstructing a physical map of a chromosome from a genomic library presents a central computational problem in genetics. Physical map reconstruction in the presence of errors is a problem of high computational complexity that provides the motivation for parallel computing. Parallelization strategies for a maximum-likelihood estimation-based approach to physical map reconstruction are presented. The estimation procedure entails a gradient descent search for determining the optimal spacings between probes for a given probe ordering. The optimal probe ordering is determined using a stochastic optimization algorithm such as simulated annealing or microcanonical annealing. A two-level parallelization strategy is proposed wherein the gradient descent search is parallelized at the lower level and the stochastic optimization algorithm is simultaneously parallelized at the higher level. Implementation and experimental results on a distributed-memory multiprocessor cluster running the parallel virtual machine (PVM) environment are presented using simulated and real hybridization data.
GENERATION of entire chromosomal maps has been a central problem in genetics right from its early years (![]()
Chromosomal maps fall into two broad categoriesgenetic maps and physical maps. Genetic maps represent an ordering of genetic markers along a chromosome where the distance between two genetic markers is related to their recombination frequency. Genetic maps are typically of low resolution, i.e., 110 Mb (![]()
![]()
![]()
Several techniques exist for generation of physical maps from contig libraries. These techniques are specific to an experimental protocol and the type of data collected, for example, mapping by nonunique probes (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| The physical mapping protocol |
|---|
The physical mapping protocol essentially determines the nature of clonal data and the probe selection procedure. The physical mapping protocol adhered to in this project is the one based on sampling without replacement (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
The protocol that generates the probe set
and the clone set
is an iterative procedure that can be described as follows. Let
i and
i be the clone set and the probe set, respectively, at the ith iteration. The initial clone set
0 consists of all the clones in the library whereas the initial probe set
0 =
. The clones in
0 are designed to be of the same length and to be overlapping so that each clone samples a fragment of the chromosome and the coverage of the entire chromosome is made possible. At the ith iteration a clone c is chosen at random from
i and added to
i. Clone c is hybridized to all the clones in
i. The subset of clones
c that hybridize to clone c are removed from
i so that
i+1 =
i -
c. Note that c
c since a clone hybridizes to itself. The hybridization experiment entails extracting complementary DNA from both ends of a probe, washing the DNA over the arrayed plate, and recording all clones in the library to which the DNA sticks (i.e., hybridizes). The above procedure is halted at the kth iteration when
k =
. The final probe set is given by
=
k and the clone set by
=
0 -
k. In the absence of errors, the probe set
represents a maximal nonoverlapping subset of
0 since any clone that overlaps with a given probe would hybridize to one end of that probe and be effectively removed from consideration in future iterations of the aforementioned iterative procedure. Fig 1 depicts the probes and clones along the length of a chromosome where the clones and probes are numbered from left to right.
|
The clone-probe overlap pattern is represented in the form of a binary hybridization matrix H where the matrix element Hij denotes the hybridization of the ith clone
to the jth probe
. The matrix element Hij = 1 if the ith clone
hybridizes to the jth probe
and Hij = 0 otherwise. Table 1 shows the clone-probe hybridization data in the absence of errors resulting from the example depicted in Fig 1. If the probes in
were ordered with respect to their position along a chromosome, then by selecting from H a common overlapping clone for each pair of adjacent probes in
a minimal set of clones and probes that covers the entire chromosome (i.e., a minimal tiling) could be obtained. Note that a common overlapping clone between two adjacent probes would hybridize to both probes. The minimal tiling in conjunction with the sequencing of each individual clone/probe in the tiling and a sequence assembly procedure that determines the overlaps between successive sequenced clones/probes in the tiling (![]()
|
In reality, the hybridization experiments are rarely error free. The hybridization matrix H could be expected to contain false positives and false negatives. The matrix element Hij would be a false positive if Hij = 1 (denoting hybridization of the ith clone with the jth probe) when in fact Hij = 0. Conversely, Hij would be a false negative if Hij = 0 when in fact Hij = 1. Other sources of error include chimerism wherein a single clone samples two or more distinct regions of a chromosome, deletions wherein certain regions of the chromosome are not sampled during the cloning process, and repeats wherein a clone samples a region of the chromosome with repetitive DNA structure. In this article, we confine ourselves to errors in the form of false positives and false negatives. Since the clones (and probes) in the mapping projects that use the aforementioned protocol are generated using cosmids, which makes them sufficiently small (
40 kb), chimerism and deletions do not pose a serious problem. However, repeats do pose a problem but are not explicitly addressed here; rather they are treated as multiple isolated incidences of false positives.
In this article we present a maximum-likelihood (ML) estimator (![]()
![]()
and the interprobe spacings under a probabilistic model of hybridization errors consisting of false positives and false negatives. The estimation procedure involves a combination of discrete and continuous optimization, where determining the probe ordering entails discrete (i.e., combinatorial) optimization, whereas determining the interprobe spacings for a particular probe ordering entails continuous optimization. We propose a two-level parallelization strategy for efficient implementation of the above estimator. The upper level consists of parallel discrete optimization using simulated annealing or microcanonical annealing whereas the lower level consists of the parallel conjugate gradient descent procedure. The resulting parallel algorithms are implemented on a distributed-memory multiprocessor cluster using the parallel virtual machine (PVM) environment (![]()
![]()
| MATERIALS AND METHODS |
|---|
Cosmid libraries used to construct the physical map of N. crassa discussed in this article are described in ![]()
![]()
![]()
| MATHEMATICAL FORMULATION OF THE ML ESTIMATOR |
|---|
In this section we present a brief synopsis of the ML estimator proposed in ![]()
![]()
and the interprobe spacings under a probabilistic model of hybridization errors consisting of false positives and false negatives.
The probe ordering problem can be formally stated as follows. Given a set
= {P1, P2, ... , Pn} of n probes and a set
= {C1, C2, ... , Ck} of k clones generated using the sampling-without-replacement protocol described earlier and the k x n clone-probe hybridization matrix H containing both false positives and false negatives with predefined probabilities, reconstruct the correct ordering
= (
1,
2, ... ,
n) of the probes and also the correct spacing Y = (Y1, Y2, ... , Yn) between the probes. The ordering
is a permutation of (1, ... , n) that gives the labels (indices) of the probes in left-to-right order across the chromosome. In the interprobe spacing vector Y, Y1 denotes the space between the left end of the first probe P
1 and the left end of the chromosome, and Yi the spacing between the right end of probe P
i-1 and the left end of probe P
i (where 2
i
n). The spacing between the right end of probe P
n and the right end of the chromosome is given by Yn+1 = N - nM -
ni=1 Yi, where N is length of the chromosome and M is the length of each clone/probe. Note that our protocol requires that all probes and clones be of the same length.
The problem as stated above is ill posed in the precise sense as defined by ![]()
and any interprobe spacing vector Y that satisfies the requirements that Yi
0; 1
i
n and N - nM -
ni=1 Yi
0, constitute a valid solution. Hence the problem is formulated as one of determining a probe ordering and the interprobe spacings that maximize the likelihood of the observed hybridization matrix H given predefined probabilities for false positives and false negatives.
Mathematical notation:
The mathematical notation used in the formulation of the maximum-likelihood estimator is given below:
- N, length of the chromosome;
- M, length of a clone/probe;
- n, number of probes;
- k, number of clones;
, probability of false positive;
, probability of false negative;
H = ((hi,j))1
i
k, 1
j
n; clone-probe hybridization matrix, where

Hi, ith row of the hybridization matrix;
= (
1, ... ,
n), permutation of {1, 2, ... , n}, which denotes the probe labels in the ordering when scanned from left to right along the chromosome;
pi =
nj=1 hij, number of 1's in Hi;
P =
ki=1 pi, total number of 1's in H, Y = (Y1, Y2, ... , Yn), vector of interclone spacings, where Yi is the spacing between the right end of P
i-1 and the left end of P
i (2
i
n), and Y1 is the spacing between the left end of P
1 and the left end of the chromosome; and
n, set of feasible interprobe spacings Y = {Y1, ... , Yn} such that Yi
0, 1
i
n, and N - nM -
ni=1 Yi
0.
The model:
Given a vector of interprobe spacings Y = (Y1, ... , Yn), there are 2n+1 possible cases to consider depending on whether 0
Yi
M or Yi > M, where 0
i
n + 1. Without loss of generality, we present the maximum-likelihood model for n = 3 and illustrate 3 of the 24 = 16 possible cases.
Case 1:
0
Y1
M, 0
Y2
M, 0
Y3
M, and 0
Y4
M as depicted in Fig 2.
|
Case 2:
0
Y1
M, 0
Y2
M, 0
Y3
M, and Y4 > M as depicted in Fig 3.
|
Case 3:
Y1 > M, 0
Y2
M, 0
Y3
M, and 0
Y4
M as depicted in Fig 4.
|
In Case 1 above, if the left end of a clone falls in regions A, C, or E, then the clone will hybridize only with P
1, P
2, or P
3, respectively (Fig 2). Similarly, if the left end of a clone falls in region B or D, then the clone will hybridize with both P
1 and P
2 or P
2 and P
3, respectively (Fig 2). In Case 2 above, if the left end of a clone falls in region E, then the clone will hybridize with only P
3 and if it falls in region F, then the clone will not hybridize with any of the probes (Fig 3). Similarly, in Case 3 above, if the left end of a clone falls in region A, the clone will not hybridize with any of the probes (Fig 4). Therefore, from the above three cases it is easy to see that, in general, there will be three different types of regions, namely,
- Type 1: Both region between probe P
j and P
j+1, for j = 1, ... , n - 1. An intervening clone hybridizes to both probes if its left end falls in this region. - Type 2: Only region of probe P
j, for j = 1, ... , n. A clone will hybridize to P
j only if its left end falls in this region. - Type 3: None region after probe P
j, for j = 0, ... , n. A clone will hybridize to no probe if its left end falls in this region. Here probe P
0 is referred to as the beginning of the chromosome.
It can be shown that for j =1, ... , n - 1, Length of the Both region between probes P
j and P
j+1
![]() |
(1) |
and for j = 1, ... , n, Length of the Only region of probe P
j
![]() |
(2) |
and for j = 0, ... , n, Length of the None region after probe P
j
![]() |
(3) |
We assume that the left ends of the clones are uniformly distributed over the interval [0, N - M]; i.e., the probes are uniformly distributed across the length of the chromosome. Therefore it can be shown that for j = 1, ... , n - 1, the probability PBoth that a randomly chosen clone will fall in the Both region of probes P
j and P
j+1 is given by
![]() |
(4) |
for j = 1, ... , n the probability POnly that a randomly chosen clone will fall in the Only region of probe P
j is given by
![]() |
(5) |
and for j = 0, ... , n the probability PNone that a randomly chosen clone will fall in the None region after probe P
j is given by
![]() |
(6) |
The conditional probability of observing a clonal signature Hi (i.e., the ith row in the hybridization matrix H), given a probe ordering
and an interprobe spacing vector Y, is given by
![]() |
(7) |
where Oi,j is the event that the clone i will fall in the Only region of probe P
j, Bi,j is the event that the clone i will fall in the Both region of probes P
j and probe P
j+1, and Ni,j is the event that the clone i will fall in the None region after probe P
j.
For a given probe ordering
and interprobe spacing vector Y, event Oi,j implies that only hi,
j = 1, and all the remaining entries in row Hi should be equal to 0. In other words, hi,
j
1 implies a false negative and a 1 in any other column position in the row Hi implies a false positive. That is,
![]() |
(8) |
and for k = 1, ... , n, where k
j,
![]() |
(9) |
Assuming that the false positive and false negative errors at different positions along the clonal signature Hi are independent of each other,
![]() |
(10) |
Similarly, for a given probe ordering
and interprobe spacing vector Y, event Bi,j implies that only hi,
j and hi,
j+1 should be equal to 1 and all the remaining entries of row Hi should be equal to 0. That is,
![]() |
(11) |
![]() |
(12) |
and for k = 1, ... , n, where k
j, j + 1,
![]() |
(13) |
Hence,
![]() |
(14) |
Finally, for a given probe ordering
and interprobe spacing vector Y, event Ni,j implies that all the elements of row Hi should be equal to 0. That is, for k = 1, ... , n,
![]() |
(15) |
Hence,
![]() |
(16) |
From Equation 4Equation 5Equation 6Equation 7Equation 8Equation 9 HREF="#FD10">Equation 10Equation 11Equation 12Equation 13Equation 14Equation 15 HREF="#FD16">Equation 16,
![]() |
(17) |
We assume that the clones
are independently distributed along the chromosome; i.e., each row of H is independent of the other rows. Hence
![]() |
(18) |
From Equation 17 and Equation 18,
![]() |
(19) |
where
![]() |
(20) |
![]() |
(21) |
and
![]() |
(22) |
The goal, therefore, is to determine
and Y that maximize P(H |
, Y) as given in Equation 19, that is, determine (
,
), where
![]() |
(23) |
Alternatively, consider the negative log-likelihood function f(
, Y) given by
![]() |
(24) |
where C is a constant given by
![]() |
(25) |
and
0 =
n+1 = 0. Since ln x is a monotonically increasing function of x for all x > 0, it follows that
![]() |
(26) |
Computation of the ML estimate:
Computing the values of
and
(Equation 26) involves a two-stage procedure.
Stage 1:
First determine the optimal spacing 
for a given probe ordering
; i.e., determine 
= (
1, ... ,
n) such that for a given
,
![]() |
(27) |
Here the minimum is taken over all feasible solutions Y that satisfy the constraints Yi
0; i = 1, ... , n and
ni=1 Yi
N - nM.
Stage 2:
Second determine
for which
![]() |
(28) |
Here the minimum is taken over all
, where
is a permutation of {1, ... , n}. The resulting values of
and 
are termed the ML estimates (MLEs) of the true probe ordering and the interprobe spacings, respectively.
Computation of 
:
A region
n is deemed to be convex if for any pair of points p, q
, all points along the line segment
p + (1 -
) q
, where 0
1. A function h:
defined on a convex set
is deemed convex if for all points
p + (1 -
) q
, where 0
1, h (
p + (1 -
)q)
h(p) + (1 -
)h(q). Alternatively, if (d2/d
2)h
0 along the line segment
p + (1 -
)q, then function h can be shown to satisfy the above condition for convexity (![]()
is considered good if for all Y
, Yi
M, 1
i
n + 1. The significance of a good region is that f
(Y) is differentiable within it.
The objective function f
(Y) (Equation 27) can be expressed as
![]() |
(29) |
where fi(Y) = Ri -
n+1j=1 (ai,
j - 1)(ai,
j-1 - 1)Yj. Consider a good convex region
, where Yj
M for 1
j
n. Consider all points Y = P + sV for s > 0, which lie on a ray originating at a given point P
in the direction V. In
the derivative of f along the ray is given by
![]() |
(30) |
where
![]() |
(31) |
and I(x) is a unit step function defined as
![]() |
(32) |
Using the fact that (
)fi(Y) = 0 along the ray, it can be shown that
![]() |
(33) |
This implies that f
(Y) is convex in every good convex region
and therefore possesses a unique local minimum that is also a global minimum. Consequently this minimum can be reached using continuous local search-based techniques such as gradient descent (i.e., steepest descent) or conjugate gradient descent (![]()
![]()
Consider the four disjoint subregions
+1,+1,
+1,-1,
-1,+1, and
-1,-1 within
, where
![]() |
(34) |
Each of these regions is convex since they result from the intersection of half spaces. Also, since the derivative of f
(Y) is defined in the interior of each subregion, each subregion is good. Note that we can define the derivative on the boundary of each subregion
a,b, a, b
{-1, +1}, on the basis of the direction in which the boundary point is approached. Thus by selecting a starting point in each of the subregions (or as many subregions as possible without violating any feasibility constraints), one can compute a local minimum for f
(Y) in each of the subregions and select the minimum of these local minima to be the global minimum of f
(Y) (![]()
The local minimum of f
(Y) in each of the aforementioned four disjoint subregions within
can be reached using continuous local search-based methods such as the steepest descent technique or the conjugate gradient descent technique (![]()
![]()
The initial value of Y = (Y1, ... , Yn) can be determined in one of several ways; the simplest solution is to assign (N - nM)/(n + 1) to each of Yi's, i.e., distribute the value of N - nM equally among the (n + 1)Yi's. Having obtained an initial value for
, the gradient of the negative log-likelihood function is computed. A function decreases most rapidly in the direction of the local negative (i.e., downhill) gradient. The local downhill gradient is given by
![]() |
(35) |
where
![]() |
(36) |
and
![]() |
(37) |
The current value of
=
old is updated by moving along the downhill gradient direction U = -
f(
,
)|Y=
old. The new value of
=
new is given by
![]() |
(38) |
The problem, therefore, is to find an optimal value of s, say s*, such that
![]() |
(39) |
Here,
![]() |
(40) |
where
n+1 = N - nM -
ni=1
i.
Having obtained the value of s*, then the new interprobe spacings are given by
![]() |
(41) |
To determine an optimal value of s = s*, consider
![]() |
(42) |
where Un+1 = -
ni=1 Ui. The convexity of f
(Y) implies that the local optimum for s is also a global optimum. Also note that we have the following boundary conditions: (i)
j + sUj
0, for j = 1, ... , n; i.e., all the spacings are nonnegative; and (ii)
nj=1(
j + sUj)
N - nM, which is a constraint imposed by the length of the chromosome and the length of each clone.
The above constraints impose bounds on s given by
![]() |
(43) |
Given the upper and lower bounds on the values of s and the convexity of f(
,
+ sU) with respect to s, the bisection method (![]()
is given by n = log2[
].
The gradient computation and the solution update steps of the steepest descent method are continued until the gradient vector attains a magnitude less than a predefined threshold. During this iterative process, it may happen that the current value of Y is on one or more of the boundaries of the feasible region in which the solution is located. These boundaries are defined by the constraints on the Yi's for i = 1, ... , n as discussed earlier. The fact that the current value of Y lies on one or more of the boundaries of the feasible region and downhill gradient vector is determined to point outside the feasible region would normally force the value of s to be equal to zero and hence stop the iterative procedure even though the gradient vector has not vanished.
The above difficulty can be overcome by projecting the downhill gradient vector U onto the feasible region. Note that all the boundary conditions on the Yi's for i = 1, ... , n are hyperplanes. Suppose that U is directed outside the feasible region, and we have k hyperplanes (corresponding to k boundary conditions) that force s to be equal to zero. In this situation one needs to project U onto the intersection of these hyperplanes. Let
1, ... ,
k be the normal vectors to these hyperplanes. If the boundary conditions are not redundant, then {
1, ... ,
k} constitutes a linearly independent set of vectors. The set {
1, ... ,
k} can be transformed into a mutually orthonormal set of vectors given by {
1, ... ,
'k} using the Gram-Schmidt orthonormalization procedure (![]()
![]() |
(44) |
The minimization procedure then proceeds along Uproj instead of U. In the limiting case when k = n, the minimization procedure has reached an extremal vertex of the admissible region and Uproj = 0. In this case, the extremal vertex is the desired minimum within the admissible region. Thus, the minimization procedure is halted when U vanishes or when an extremal vertex is reached (i.e., Uproj vanishes) depending on which situation is encountered first.
Computation of
:
Determining the optimal clone ordering
entails a combinatorial search through the discrete space of all possible permutations of {1, ... , n}. The problem of coming up with such an optimal ordering is isomorphic to the classical nondeterministic polynomial-complete optimal linear arrangement (OLA) problem for which no polynomial-time algorithm for determining the optimal solution is known (![]()
![]()
![]()
![]()
A single iteration of a stochastic hill-climbing search algorithm consists of three phases: (i) perturb, (ii) evaluate, and (iii) decide. In the perturb phase, the current solution xi to a multivariate objective function E(x), which is to be minimized, is systematically perturbed to yield another candidate solution xj. Here, the probe ordering is systematically perturbed by reversing the ordering within a block of probes where the endpoints of the block are chosen at random. In the evaluate phase, E(xj) is computed. Here, the function f(
,
) (Equation 27) is computed. In the decide phase, xj is accepted and replaces xi probabilistically, using a stochastic decision function. The stochastic decision function is annealed in a manner such that the search process resembles a random search in the earlier stages and a greedy local search or a deterministic hill-climbing search in the latter stages. The major difference between simulated annealing and microcanonical annealing arises from the difference in the stochastic decision function used in the decision phase. Their common feature is that, starting from an initial solution, they generate, in the limit, an ergodic Markov chain of solution states that asymptotically converges to a stationary Boltzmann distribution (![]()
![]()
Simulated annealing:
In the decide phase of SA, the new candidate solution xj is accepted with probability p, which is computed using the Metropolis function (![]()
![]() |
(45) |
or, using the Boltzmann function B(T),
![]() |
(46) |
(![]()
The Metropolis function and the Boltzmann function give SA the capability of climbing out of local minima. Several iterations of SA are carried out for a given value of T and are referred to as an annealing step. The value of T is systematically reduced using an annealing function. As can be seen from Equation 45 and Equation 46, at sufficiently high temperatures SA resembles a completely random search, whereas at lower temperatures it acquires the characteristics of a deterministic hill-climbing (i.e., local or greedy) search.
SA generates an asymptotically ergodic (and hence stationary) Markov chain of solution states at a given temperature using Equation 45 and Equation 46. Logarithmic annealing schedules of the form Tk =
k for some value of R > 0 have been shown to be asymptotically good (![]()
(![]()
Tk was used where
= 0.9. Although the geometric annealing schedule does not strictly ensure asymptotic convergence to a global optimum as does the logarithmic annealing schedule, it is much faster and has been found to yield very good solutions in practice (![]()
Microcanonical annealing: MCA models a physical system whose total energy, i.e., the sum of kinetic energy and potential energy, is always conserved. The potential energy of the system is the multivariate objective function E(x) to be minimized, whereas the kinetic energy Ek > 0 is represented by a demon or a collection of demons. In the latter case, the total kinetic energy is the sum of all the demon energies. The demon energy (or energies) serve(s) to provide the system with an extra degree (or degrees) of freedom thus enabling MCA to escape from local minima.
In the decide phase of MCA, if E(xj) < E(xi), then xj is accepted as the new solution. If E(xj)
E(xi), then xj is accepted as the new solution only if Ek
E(xj) - E(xi). If E(xj)
E(xi) and Ek < E(xj) - E(xi) then the current solution xi is retained. In the event that xj is accepted as the new solution, the kinetic energy demon is updated to En+1k = Enk + [E(xi) - E(xj)] to ensure the conservation of the total energy. The kinetic energy parameter Ek is annealed in a manner similar to the temperature parameter T in SA. MCA can also be shown to converge to a global minimum with unit probability given a logarithmic annealing schedule (![]()
In the context of probe ordering, a kinetic energy demon is assigned to each distinct pair of probes. A square symmetric matrix Ek of size n x n, where n is the number of probes, is used to store the demon energy for each probe pair. Thus, an entry Ek(i, j) refers to the kinetic energy of the demon associated with the ith and jth probe. As in the case of SA, the convergence criterion used was the fact that the value of the objective function had not changed for the past k annealing steps. A geometric annealing schedule of the form Em+1k(i, j) =
Emk(i, j) was used where
= 0.9.
In spite of the robustness of SA and MCA to the presence of local minima in the solution landscape, the annealing schedule needed for asymptotic convergence is computationally intensive. This provides the motivation for the parallel computation of the maximum-likelihood estimator.
| PARALLEL COMPUTATION OF THE MAXIMUM-LIKELIHOOD ESTIMATOR |
|---|
Computation of the maximum-likelihood estimator entails two levels of parallelism corresponding to the two stages of optimization discussed in the previous section:
- Level 1: Parallel computation of the optimal interprobe spacing

for a given probe ordering
(Equation 27). This entails parallelization of the gradient descent search procedure for constrained optimization in the continuous domain. - Level 2: Parallel computation of the optimal probe ordering (Equation 28). This entails parallelization of the stochastic hill-climbing search procedure (SA or MCA) for optimization in the discrete domain.
Both levels of parallel computation were implemented on PVM (![]()
![]()
Parallel stochastic hill-climbing search:
Parallelization of annealing algorithms has been attempted by several researchers especially in the area of computer-aided design, image processing, and operations research (![]()
![]()
- Functional parallelism within a move where the task of evaluating each move is decomposed into subtasks that are performed in parallel by multiple processors (
WONG and FIEBRICH 1987 ).
- Control parallelism with multiple active iterations with processors engaged in speculative computation (
WITTE et al. 1991 ).
- Control parallelism with parallel Markov chain generation using a systolic algorithm (
AARTS et al. 1986 ;
GREENING 1990 ;
KIM and KIM 1990 ;
AZENCOTT 1992 ).
- Control parallelism with multiple searches of the solution space where the searches could be interacting, in which case the processors exchange data periodically, or noninteracting in which case the processors proceed independently of each other (
AZENCOTT 1992 ;
LEE 1995 ).
- Data parallelism with the state variables in a multivariate solution vector distributed among the individual processors in the multiprocessor architecture accompanied by either (1) parallel evaluation of multiple moves with acceptance of a single move (
CASOTTO et al. 1987 ), or (2) parallel evaluation of multiple moves with acceptance of multiple noninteracting moves (
JAYARAMAN and RUTENBAR 1987 ), or parallel evaluation and acceptance of multiple moves (
BANERJEE et al. 1990 ).
The authors' earlier work in chromosome reconstruction involved ordering clones, i.e., rows of the hybridization matrix H as shown in Table 1 for physical map generation (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
- The noninteracting local Markov chain (NILM) PSA and PMCA algorithms.
- The periodically interacting local Markov chain (PILM) PSA and PMCA algorithms.
In the NILM PSA and NILM PMCA algorithms, each processor within the PVM system runs an independent version of the serial SA or MCA algorithm. In essence, there are as many Markov chains of solution states as there are physical processors within the PVM system. Each Markov chain is local to a given processor and at any instant each processor maintains a candidate solution that is an element of its local Markov chain of solution states. The serial SA and MCA algorithms run asynchronously on each processor; i.e., at each temperature value or kinetic energy value each processor iterates through the perturb-evaluate-accept cycle concurrently (but asynchronously) with all the other processors.
The perturbation function uses a parallel random number generator to generate the Markov chains of solution states. By assigning a distinct seed to each processor at the start of execution, it is ensured that each processor contains a Markov chain of solution states that is independent from those in most or all of the other processors. The evaluation function and the decision function are executed concurrently on the solution state within each processor. On termination of the annealing processes on all the processors, the best solution is selected from among all the solutions available on the individual processors. The NILM model is essentially that of multiple independent (i.e., noninteracting) searches as described in item (D) above.
The PILM PSA and PILM PMCA algorithms are similar to their NILM counterparts except for one major difference. Just before the parameter T or Ek is updated using the annealing function, the best candidate solution from among those in all the processors is selected and duplicated on all the other processors. The goal of this synchronization procedure is to focus the search in the more promising regions of the solution space. This suggests that the PILM PSA or PILM PMCA algorithm should be potentially superior to its NILM counterpart. It should be noted that in the case of all four algorithms, PILM PSA, PILM PMCA, NILM PSA, and NILM PMCA, a single Markov chain of solution states is generated entirely within a single processor. The PILM model is essentially that of multiple periodically interacting searches as described in item (D) above.
In the case of all the four algorithms, NILM PSA, NILM PMCA, PILM PSA, and PILM PMCA, a master process is used as the overall controlling process. The master process runs on one of the processors within the PVM system. The master process spawns child processes on each of the other processors within the PVM system, broadcasts the data subsets needed by each child process, collects the final results from each of the child processes, and terminates the child processes. The master process, in addition to the above-mentioned functions for task initiation, task coordination, and task termination, also runs its own version of the SA or MCA algorithm just as does any of its child processes.
In the case of the PILM PSA and the PILM PMCA algorithms, before the parameter T or Ek is updated, the master process collects the results from each child process along with its own result, broadcasts the best result to all the child processes, and also replaces its own result with the best result. The master process updates its temperature or kinetic energy value using the annealing schedule and proceeds with its local version of the SA or MCA algorithm. On convergence, the master process collects the final results from each of the child processes along with its own, selects the best result as the final solution, and terminates the child processes.
Each of the child processes in the PILM PSA and the PILM PMCA algorithms receives the initial parameters from the master process and runs its local version of the SA or MCA algorithm. At the end of each annealing step, each child process conveys its result to the master process, receives the best result thus far from the master process, and replaces its result with the best result thus far before proceeding with the next annealing step. On convergence, each child process conveys its result to the master process. The master process and child process for the PILM PSA and the PILM PMCA algorithms are depicted in Fig 5 and Fig 6, respectively.
|
|
The master and child processes for the NILM PSA and NILM PMCA algorithms are similar to those of their PILM counterparts except for the absence of the process coordination phase (Fig 5 and Fig 6) in the former. Note that it is during the process coordination phase at the end of each annealing step that the master process and the child processes interact in the PILM PSA and PILM PMCA algorithms.
Parallel gradient descent:
Deterministic optimization techniques such as steepest descent and conjugate gradient descent are generally used for unconstrained optimization in the continuous domain (![]()
Yi
M for i = 1, ... , n. One of the well-known problems with the steepest descent method is that it takes several small steps while descending a valley in the solution landscape, which usually causes it to be much slower compared to other techniques in its class (![]()
![]()
![]()
The conjugate gradient descent procedure is very similar to the steepest descent procedure with the only difference that different directions are followed while minimizing the objective function. Instead of consistently following the local downhill gradient (i.e., the direction of steepest descent), a set of n mutually orthonormal (i.e., conjugate) direction vectors are generated from the downhill gradient vector where n is the dimensionality of the solution space. The orthonormality condition ensures that minimization along any given direction vector does not jeopardize the minimization along another direction vector within this set. Unlike the steepest descent algorithm, the conjugate gradient descent algorithm guarantees convergence to a local minimum within n steps. For this reason, the conjugate gradient descent algorithm was chosen instead of the steepest descent algorithm in the current implementation. The serial conjugate gradient descent algorithm is depicted in Fig 7. The Hessian matrix is usually required to generate a set of conjugate vectors to minimize a particular objective function (![]()
![]()
![]()
![]()
|
Due to its inherently sequential nature, data parallelism was deemed to be the appropriate parallelization scheme for the conjugate gradient descent algorithm. The Y and G vectors are distributed among the different processors constituting the virtual machine. Each processor performs the required operations on its local Yloc and Gloc subvectors concurrently with the other processors. Here, |Yloc| =
and |Gloc| =
, where Nproc is the number of processors in the virtual machine. This entails some interprocessor communication and synchronization overhead since the individual subvectors have to be periodically scattered (i.e., distributed) among the processors and also periodically gathered (i.e., combined) to compute a global value. For example, in the bisection procedure (i.e., procedure for minimizing along the direction G) one needs to evaluate the objective function and compute its one-dimensional derivative with respect to s during each iteration. Both of these operations require gathering of data (i.e., subvectors) within the different processors.
PVM permits one to define a group of processes such that reduction, scattering, and gathering operations can be performed selectively on the processes within the group. The pvm_scatter and pvm_gather primitives were used to scatter and gather the Y and G vectors among the different processors, respectively. It is to be noted, however, that scatter and gather operations in PVM are collective calls and need to be performed by all the processors concurrently. Synchronizing the processors, therefore, is necessary before any such call is initiated, thereby increasing the parallelization overhead. The parallelization scheme for the conjugate gradient descent algorithm follows the master-child model used for the PSA and PMCA algorithms. The master and child processes are depicted in Fig 8 and Fig 9, respectively.
|
|
A two-level parallelism approach for computation of the maximum-likelihood estimator:
To ensure a scalable implementation, two levels of parallelism were incorporated in the computation of the maximum-likelihood estimator. The finer or lower level of parallelism pertains to the computation of
for a given probe ordering
using the parallel conjugate gradient descent algorithm for continuous optimization as discussed previously. The coarser or upper level of parallelization pertains to the computation of
using the NILM or PILM, PSA or PMCA algorithm for discrete stochastic optimization as discussed previously. The two-level parallelization scheme is depicted in Fig 10.
|
At the coarser level, the user has a choice of any of the four parallel stochastic hill-climbing algorithms: PILM PSA, NILM PSA, PILM PMCA, or NILM PMCA. The parallelization of the conjugate gradient descent algorithm at the finer level is transparent to the parallel stochastic hill-climbing algorithms at the coarser level. In other words, the communication and control scheme for the parallel stochastic hill-climbing algorithms is independent of that of the parallel conjugate gradient descent algorithm. This enhances the modularity and flexibility of the system. For example, one could use the serial or parallel version of the conjugate gradient descent algorithm or, for that matter, any other serial or parallel algorithm for continuous deterministic optimization at the finer level without having to make any changes to the parallel stochastic hill-climbing algorithms at the coarser level.
The interaction between the master and child stochastic hill-climbing (i.e., annealing) processes (NILM/PILM PSA/PMCA) is shown with the double-headed arrows between the larger components in Fig 10. A parallel conjugate gradient descent algorithm is embedded within each of the stochastic hill-climbing processes. When the parallel conjugate gradient descent procedure is invoked from within the master or child stochastic hill-climbing process, a new set of child conjugate gradient descent processes is spawned on the available processors, whereas the master conjugate gradient descent process runs on the same processor as the stochastic hill-climbing process (master or child). The master and child conjugate gradient descent processes cooperate to evaluate and minimize the value of the objective function for a specific probe ordering
. Once the objective function is evaluated, the child conjugate gradient descent processes terminate, and the corresponding processors are available for future computation. The interaction between the master and child conjugate gradient descent processes is depicted by the double-headed arrows within each of the major components in Fig 10.
The two-level parallelism approach can be seen to induce a logical tree-shaped interconnection network on the processors in the PVM system. The first level is the group of processors that run the (master and child) stochastic hill-climbing processes, and the second level is the group of processors that run the child conjugate gradient descent processes. The processors that run the child conjugate gradient descent processes are connected to the processor running the parent stochastic hill-climbing process but are independent of the processors running other stochastic hill-climbing processes.
| EXPERIMENTAL RESULTS |
|---|
Experimental results on simulated data:
The parallel algorithms were implemented on a dedicated PVM cluster of 200-MHz PentiumPro processors running Solaris-x86 and tested with artificially generated clone-probe hybridization data (![]()
![]()
|
The serial stochastic hill-climbing algorithms (SA and MCA) were implemented with the following parameters: the initial value for the temperature or demon energy was chosen to be 0.5, and the maximum number of iterations D for each annealing step was chosen to be 100 · n. The current annealing step was terminated when the maximum number of iterations was reached or when the number of successful perturbations equaled 10 · n (i.e., 10% of the maximum number of iterations), whichever was encountered first. The temperature or demon energy values were systematically reduced at the end of each annealing step, using a geometric annealing schedule with the annealing factor
= 0.95. The algorithm was terminated (and deemed to have reached a global optimum) when the number of successful perturbations in any annealing step equaled 0.
In the case of the parallel stochastic hill-climbing algorithms (NILM PSA, PILM PSA, NILM MCA, and PILM MCA) the product of Nproc and the maximum number of iterations D performed by a processor in a single annealing step was kept constant, i.e., D =
. This ensured that the overall workload remained constant as the number of processors was varied, thus enabling one to examine the scalability of the speed-up and efficiency of the algorithms for a given problem size with increasing number of processors. The other parameters for the parallel stochastic hill-climbing algorithms were identical to those of their serial counterparts. In the NILM PSA and NILM PMCA algorithms, each process was independently terminated when the number of successful perturbations in any annealing step for that process equaled 0. In the PILM PSA and PILM PMCA algorithms, each process was terminated when the number of successful perturbations in an annealing step equalled 0 for all the processes i.e., the master process and all the child processes. This condition was checked during the synchronization phase at the end of each annealing step.
The parallel conjugate gradient descent algorithm was tested on artificial data sets with a varying number of probes (n = 100, 200, and 300). Table 3 shows the run-time results for the parallel conjugate gradient descent algorithm with varying number of processors Nproc and varying number of probes n. Table 3 also shows the run-time results for the serial steepest descent algorithm on the same data sets. As can be seen from Table 3, the serial version of the conjugate gradient descent algorithm is significantly faster than the serial version of the steepest descent algorithm. Moreover, Table 3 also shows how the performance of the parallel conjugate gradient descent algorithm scales with an increasing number of processors and increasing problem size.
|
The speed-up results for Table 3 are shown in Table 4. For n = 100, the best speed-up is attained for Nproc = 2 with a degradation in speed-up for Nproc > 2. For n = 200, the results show the best speed-up for Nproc = 4 with a degradation in speed-up for Nproc > 4. As for n = 300, the best speed-up is seen for Nproc = 8. These results are in conformity with expectations, since the interprocessor communication overhead and synchronization overhead tend to increasingly dominate the overall execution time with increasing number of processors Nproc for a problem of given size (i.e., for a given value of n). The interprocessor communication overhead and synchronization overhead as a percentage of the computational load are higher for smaller problem sizes, which explains why the maximum speed-up obtained is for values of Nproc < 8. In summary, the payoff in the parallelization of the conjugate gradient descent algorithm is realized only for large values of n (i.e., large problem sizes). This is a natural consequence of the network latency inherent in PVM systems that are comprised of a network of workstations.
|
The total number of iterations of the serial and parallel versions (with different Nproc values) of the conjugate gradient descent algorithm prior to convergence was also examined (Table 5). As can be seen in Table 5, the number of iterations prior to convergence varies significantly between the serial and the parallel versions of the conjugate gradient descent algorithm. On closer examination, we were able to conclude that this was due to numerical errors introduced by the parallelization process. Since the objective function is mathematically involved, and since the order in which numbers are added in the serial implementation and each of the parallel versions (with different Nproc values) is different, a numerical error is introduced. This error is the difference in the objective function values computed by the serial and parallel versions of the conjugate gradient descent algorithm in a given iteration having started from the same initial point. It was noted that the numerical error is initially insignificant. However, due to the iterative nature of the conjugate gradient descent algorithm, the results of the ith iteration depend entirely on the results of the (i - 1)th iteration, thereby causing the error to build up throughout the computation. Also, to ensure convergence to a global minimum very high-precision thresholds have been used to test for convergence. For instance, the bisection method terminates once the value of the objective function varies by <10-10. The conjugate gradient descent algorithm itself is deemed to have converged to a global minimum when the value of the objective function varies by <10-5 (if the gradient vector has not vanished by then). Consequently, all three factors, i.e., the different order of execution of arithmetic operations, the data dependency between successive iterations, and the high-precision thresholds, contribute to the discrepancy in the number of iterations prior to convergence between the serial and parallel versions (with different Nproc values) of the conjugate gradient descent algorithm. The final results of the serial and the parallel versions of the conjugate gradient descent algorithm are equivalent up to a precision of 10-4.
|
Table 6 shows the average number of iterations per minute for both the serial and parallel versions of the conjugate gradient descent algorithm. This information is derived from the overall execution time (Table 3) and the number of iterations prior to convergence (Table 5). Table 7 shows the average time (in seconds) per iteration for the serial and parallel versions (with different Nproc values) of the conjugate gradient descent algorithm. Table 8 shows the speed-up values for the parallel conjugate gradient descent algorithm for varying values of Nproc based on the results in Table 7. Although the speed-up results in Table 8 compare well with those in Table 4, one can observe greater uniformity and consistency in the results presented in Table 8. The speed-up curves corresponding to the speed-up results in Table 4 and Table 8 are shown in Fig 11 and Fig 12, respectively.
|
|
|
|
|
In the case of the parallel stochastic hill-climbing algorithms, experiments were conducted on problem sizes of n = 10 and n = 30 probes. All four algorithmsNILM PSA, PILM PSA, NILM PMCA, and PILM PMCAwere tested using the MLE objective function. Since the value of n (i.e., problem size) is small, the serial version of the conjugate gradient descent algorithm was used. The run-time results for these algorithms are shown in Table 9. It can be observed that the serial and parallel versions of the MCA algorithm took longer to converge when compared to their SA counterparts. As Table 10 shows, the average run time per annealing step of the serial or any of the parallel versions (with different Nproc values) of the SA algorithm is higher than that of its MCA counterpart. The higher run time of the PMCA algorithms is attributable to the fact that they took a larger number of iterations to converge to a globally optimal solution.
|
|
The speed-up results for the PSA and PMCA algorithms are shown in Table 11. The corresponding speed-up curves for n = 10 and n = 30 are shown in Fig 13 and Fig 14, respectively. As can be observed, the PSA and PMCA algorithms exhibit consistent and scalable speed-up with increasing Nproc values. As expected, the speed-up scales better with increasing Nproc values for larger problem sizes (i.e., larger values of n). The PSA and PMCA algorithms arrived at the correct probe ordering in all cases but for one exception. In the case n = 30 and Nproc = 8, the PMCA algorithm came up with the reverse probe ordering (i.e.,
R instead of
), which is expected since the likelihood function is unique only up to reversal in the probe ordering. Consequently, the MLE procedure is capable of recovering the correct probe ordering only up to reversal.
|
|
|
The absolute root mean squared error (RMSE)
between the true interprobe spacings Y and the estimated interprobe spacings
is defined as
![]() |
(47) |
The RMSE value can also be expressed as a percentage of N, the total length of the chromosome. The absolute and percentage RMSE values for the PSA and PMCA algorithms for n = 10 and n = 30 are shown in Table 12. From the statistical theory underlying the maximum-likelihood estimation procedure it can be shown that the percentage RMSE value asymptotically approaches 0 in the limit n
(![]()
![]()
|
Experimental results on real data:
The parallel implementation of the MLE-based physical mapping algorithm was also tested on real probe-clone hybridization data from linkage group VII of the fungal genome N. crassa. The data were comprised of 105 probes and 1678 clones. The physical map was generated by first ordering the probes and estimating the interprobe distances using the MLE procedure. The intervening clones were then inserted between pairs of successive probes to generate a physical map of the chromosome. The resulting physical map exhibiting a total of 39 contig breaks can be viewed on the GENETICS web site at http://www.genetics.org/supplemental/. The 1's in the physical map that do not conform to the consecutive 1's property (![]()
![]()
| CONCLUSIONS AND DISCUSSION |
|---|
In this article, a MLE-based approach to physical map reconstruction under a probabilistic model of hybridization errors consisting of false positives and false negatives was presented. The maximum-likelihood estimator optimizes a likelihood function defined by the ordering of probes and interprobe spacings under an experimental protocol wherein clones of equal length are hybridized to a maximal subset of nonoverlapping equal-length clones termed probes. The estimation procedure was shown to involve a combination of continuous and discrete optimization, the former to determine a set of optimal interprobe spacings for a given probe ordering and the latter to determine the optimal probe ordering. The conjugate gradient descent procedure was used to determine the optimal spacings between probes for a given probe ordering. The optimal probe ordering was determined using stochastic combinatorial optimization procedures such as SA and MCA.
The problem of MLE-based physical map reconstruction in the presence of errors is a problem of high computational complexity, thus providing the motivation for parallel computing. A two-level parallelization strategy was proposed wherein the conjugate gradient descent procedure was parallelized at the lower level, and the stochastic hill-climbing algorithm was simultaneously parallelized at the higher level. The parallel algorithms were implemented on a distributed-memory multiprocessor cluster running the PVM environment. A data parallel approach, where the components of the gradient vector are distributed among the individual processors in the PVM system, was deemed more suitable for the parallelization of the conjugate gradient descent procedure. A control parallel scheme where individual processors perform noninteracting or periodically interacting searches was deemed more suitable for the parallelization of the combinatorial stochastic hill-climbing procedures.
Experimental results on artificial clone-probe data showed that the payoff in data parallelization of the conjugate gradient descent procedure was realized only for large problem sizes (i.e., large values of n). A similar trend was observed in the case of the parallel stochastic hill-climbing algorithms. In all cases, the parallel implementation of the maximum-likelihood estimator resulted in the correct probe ordering, except in one case wherein the probe ordering was reversed. The RMSE between the resulting interprobe distances and the true interprobe distances was computed. The percentage RMSE was seen to exhibit a decreasing trend with increasing problem values of n (i.e., problem size), which is in conformity with the statistical theory underlying the MLE procedure. Experimental results on real hybridization data from linkage group VII of the fungal genome N. crassa yielded a physical map with 39 contig breaks. The parallel MLE-based physical mapping procedure that used a combination of NILM PSA and the serial CGD algorithm exhibited a speed-up of 1.5 on a three-workstation cluster interconnected via 100 Mbs fast Ethernet.
The formulation of the ML model entailed certain key assumptions. The assumption of a constant probe/clone size is reasonable (![]()
![]()
![]()
![]()
![]()
![]()
Future research will attempt to improve the performance of the parallel MLE-based physical mapping procedure on real hybridization data. Future research will also investigate extensions and enhancements to the ML objective function to account for errors due to repeat DNA sequences, deletions, and chimerism. The consistency of the ML estimator needs to be rigorously analyzed. This would entail a proof of the asymptotic convergence of the inferred physical map to the true physical map with probability one as the number of probes (and clones) grows. Statistical methods need to be developed to assess the statistical reliability of the ML estimator. The rate of convergence of the inferred physical map to the true physical map could be used to establish asymptotic confidence levels for links between pairs of probes (or clones) on the physical map (![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
The authors thank Dr. David Lowenthal for access to his PVM workstation cluster. This research was supported in part by a National Research Initiative Competitive Grants Program (NRICGP) grant by the U.S. Department of Agriculture to Dr. Suchendra M. Bhandarkar.
Manuscript received September 1, 2000; Accepted for publication December 19, 2000.
| LITERATURE CITED |
|---|
AARTS, E. H. L., and K. KORST, 1989 Simulated Annealing and Boltzman Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley, New York.
AARTS, E. H. L., F. M. J. DE BONT, J. H. A. HABERS and P. J. M. VAN LAARHOVEN, 1986 A parallel statistical cooling algorithm, pp. 8797 in Lecture Notes in Computer Science: Proceedings of the 3rd Annual Symposium on Theoretical Aspects of Computer Science, Vol. 210. Springer-Verlag, Berlin.
AIGN, V., U. SCHULTE, and J. D. HOHEISEL, 2001 Hybridization mapping of Neurospora crassa linkage groups II and V. Genetics 157:1015-1020
ALIZADEH, F., R. M. KARP, D. K. WEISSER and G. ZWEIG, 1994 Physical mapping of chromosomes using unique probes, pp. 489500 in Proceedings of the ACM-SIAM Conference on Discrete Algorithms. ACM Press, New York.
ALIZADEH, F., R. M. KARP, L. A. NEWBERG, and D. K. WEISSER, 1995 Physical mapping of chromosomes: a combinatorial problem in molecular biology. Algorithmica 13(1/2):52-76.
ARNOLD, J., 1997 Editorial. Fungal Genet. Biol. 21(3):254-257[Medline].
ARNOLD, J. and M. T. CUSHION, 1997 Constructing a physical map of the Pneumocystis genome. J. Eukaryot. Microbiol. 44:8S[Medline].
ARRATIA, R., E. S. LANDER, S. TAVARE, and M. S. WATERMAN, 1991 Genomic mapping by anchoring random probes: a mathematical analysis. Genomics 11:806-827[Medline].
AZENCOTT, R. (Editor), 1992 Simulated Annealing: Parallelization Techniques. Wiley, New York.
BALDING, D. J., 1994 Design and analysis of chromosome physical mapping experiments. Philos. Trans. R. Soc. Lond. Ser. B 334:329-335.
BANERJEE, P., M. H. JONES, and J. S. SARGENT, 1990 Parallel simulated annealing algorithms for cell placement on the hypercube multiprocessor. IEEE Trans. Parallel Distributed Syst. 1:91-106.
BEN-DOR, A., and B. CHOR, 1997 On constructing radiation hybrid maps, pp. 1726 in Proceedings of the ACM Conference on Computational Molecular Biology. ACM Press, New York.
BENNETT, J. W., 1997 White paper: genomics for filamentous fungi. Fungal Genet. Biol. 21(1):3-7[Medline].
BHANDARKAR, S. M., 1997 Parallel processing for chromosome reconstruction from physical mapsa case study of MIMD parallelism on the hypercube. Parallel Algorithms and Applications 12:231-252.
BHANDARKAR, S. M. and S. MACHAKA, 1997 Chromosome reconstruction from physical maps using a cluster of workstations. J. Supercomput. 11(1):61-86.
BHANDARKAR, S. M., S. CHIRRAVURI, and J. ARNOLD, 1996a Parallel computing of physical mapsa comparative study in SIMD and MIMD parallelism. J. Comput. Biol. 3(4):503-528[Medline].
BHANDARKAR, S. M., S. CHIRRAVURI, and J. ARNOLD, 1996b PARODSA study of parallel algorithms for ordering DNA sequences. Int. J. Comput. Appl. Biosci. 12(4):269-280.
BHANDARKAR, S. M., S. CHIRRAVURI, S. MACHAKA, and J. ARNOLD, 1998 Parallel computing for chromosome reconstruction via ordering of DNA sequences. Parallel Comput. 24(8):1177-1204.
BHANOT, G., M. CREUTZ, and H. NEUBERGER, 1984 Microcanonical simulation of Ising systems. Nuclear Phy. B 235(FS11):417-434.
BOOTH, K. S. and G. S. LUEKER, 1976 Testing for the consecutive one's property, interval graphs and graph planarity using pq-tree algorithms. J. Comput. Systems Sci. 13:335-379.
BRODY, H., J. GRIFFITH, A. J. CUTICCHIA, J. ARNOLD, and W. TIMBERLAKE, 1991 Chromosome-specific recombinant libraries from the fungus Aspergillus nidulans.. Nucleic Acids Res. 19:3105-3109
CASOTTO, A., F. ROMEO, and A. SANGIOVANNI-VINCENTELLI, 1987 A parallel simulated annealing algorithm for the placement of macro cells. IEEE Trans. Computer-Aided Design 1:838-847.
CHOR, B., and M. SUDAN, 1995 A geometric approach to betweenness, pp. 227237 in Proceedings of the European Symposium on Algorithms: Springer-Verlag Lecture Notes in Computer Science, Vol. 979. Springer-Verlag, Berlin.
CHRISTOF, T., and J. D. KECECIOGLU, 1999 Computing physical maps of chromosomes with non-overlapping probes by branch-and-cut. Proceedings of the ACM Conference on Computational Molecular Biology, Lyon, France, pp. 115123.
CHRISTOF, T., M. JÜNGER, J. D. KECECIOGLU, P. MUTZEL, and G. REINELT, 1997 A branch-and-cut to physical mapping of chromosomes by unique end probes. J. Comput. Biol. 4(4):433-447[Medline].
CREUTZ, M., 1983 Microcanonical Monte Carlo simulation. Phys. Rev. Lett. 50(19):1411-1414.
CUTICCHIA, A. J., J. ARNOLD, and W. E. TIMBERLAKE, 1992 The use of simulated annealing in chromosome reconstruction experiments based on binary scoring. Genetics 132:591-601[Abstract].
CUTICCHIA, A. J., J. ARNOLD, and W. E. TIMBERLAKE, 1993 ODS: ordering DNA sequencesa physical mapping algorithm based on simulated annealing. Comput. Appl. Biosci. 9(2):215-219
DORNY, C. N., 1980 A Vector Space Approach to Models and Optimization. R. E. Krieger, Huntington, NY.
FASULO, D. P., T. JIANG, R. M. KARP, R. SETTERGREN and E. C. THAYER, 1997 An algorithmic approach to multiple complete digest mapping, pp. 118127 in Proceedings of the ACM Conference on Computational Molecular Biology. ACM Press, New York.
FU, Y. X., W. E. TIMBERLAKE, and J. ARNOLD, 1992 On the design of genome mapping experiments using short synthetic oligo nucleotides. Biometrics 48:337-359[Medline].
GAREY, M. S., and D. S. JOHNSON, 1979 Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman, New York.
GEIST, A., A. BEGUELIN, J. DONGARRA, W. JIANG, R. MANCHECK et al., 1994 PVM Parallel Virtual MachineA User's Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA.
GEMAN, S. and D. GEMAN, 1984 Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6:721-741.
GREENING, D. R., 1990 Parallel simulated annealing techniques. Physica D 42:293-306.
GREENBERG, D. S. and S. ISTRAIL, 1995 Physical mapping by STS hybridization: algorithmic strategies and the challenge of software evaluation. J. Comput. Biol. 2(2):219-273[Medline].
HADAMARD, H., 1923 Lectures on the Cauchy Problem in Linear Partial Differential Equations. Yale University Press, New Haven, CT.
HALL, R. D., S. M. BHANDARKAR, and J. WANG, 2001 ODS2: a multiplatform software application for creating integrated physical and genetic maps. Genetics 157:1045-1056
HESTENES, M. R., 1980 Conjugate Direction Methods in Optimization. Springer-Verlag, New York.
HESTENES, M. R. and E. STIEFEL, 1952 Methods of conjugate gradient for solving linear systems. J. Res. Natl. Bureau Standards 49:409-436.
HOGG, R. V., and A. T. CRAIG, 1995 Introduction to Mathematical Statistics, Ed. 5. Prentice-Hall, Englewood Cliffs, NJ.
JAIN, M. and E. W. MYERS, 1997 Algorithms for computing and integrating physical maps using unique probes. J. Comput. Biol. 4(4):449-466[Medline].
JAYARAMAN, R., and R. RUTENBAR, 1987 Floor planning by annealing ona hypercube multiprocessor, pp. 346349 in Proceedings of the IEEE International Conference on Computer-Aided Design. IEEE Press, Washington, DC.
JIANG, T., and R. M. KARP, 1997 Mapping clones with a given ordering or interleaving, pp. 400409 in Proceedings of the ACM-SIAM Conference on Discrete Algorithms. ACM Press, New York.
KARP, R. M., and R. SHAMIR, 1998 Algorithms for optical mapping, pp. 117124 in Proceedings of the ACM Conference on Computational Molecular Biology. ACM Press, New York.
KECECIOGLU, J. D. and E. W. MYERS, 1995 Combinatorial algorithms for DNA sequence assembly. Algorithmica 13:7-51.
KECECIOGLU, J. D., S. S. SHETE and J. ARNOLD, 2000 Reconstructing distances in physical maps of chromosomes with nonoverlapping probes. Proceedings of the ACM Conference on Computational Molecular Biology, Tokyo, Japan, pp. 183192.
KELKAR, H. S., J. GRIFFITH, M. E. CASE, S. F. COVERT, and R. D. HALL et al., 2001 The Neurospora crassa genome: cosmid libraries sorted by chromosome. Genetics 157:979-990
KIM, Y. and M. KIM, 1990 A stepwise overlapped parallel annealing algorithm on a message passing multiprocessor system. Concurrency: Practice Experience 2(2):123-148.
KINCAID, D., and W. CHENEY, 1991 Numerical Analysis Mathematics of Scientific Computing. Brooks/Cole, Pacific Grove, CA.
KIRKPATRICK, S., C. GELATT, JR., and M. VECCHI, 1983 Optimization by simulated annealing. Science 220:498-516
LANDER, E. S. and P. GREEN, 1987 Construction of multi-locus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84:2363-2367
LANDER, E. S. and M. S. WATERMAN, 1988 Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2:231-239[Medline].
LEE, F. H., 1995 Parallel simulated annealing on a message-passing multicomputer. Ph.D. dissertation, Department of Electrical Engineering, Utah State University, Logan, UT.
LEE, J. K., V. DANCIK and M. S. WATERMAN, 1998 Estimation for restriction sites observed by optical mapping using reversible-jump Markov chain Monte Carlo, pp. 147152 in Proceedings of the ACM Conference on Computational Molecular Biology. ACM Press, New York.
LEHMAN, E. L., 1983 Theory of Point Estimation. Wiley, New York.
METROPOLIS, N., A. ROSENBLUTH, M. ROSENBLUTH, A. TELLER, and E. TELLER, 1953 Equation of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1092.
MIZUKAMI, T., W. I. CHANG, I. GARKATSEVE, N. KAPLAN, and D. LOMBARDI et al., 1993 A 13Kb resolution cosmid map of the 14 Mb fission yeast genome by nonrandom sequence-tagged site mapping. Cell 73:121-132[Medline].
MOTT, R., A. V. GRIGORIEV, E. MAIER, J. D. HOHEISEL, and H. LEHRACH, 1993 Algorithms and software tools for ordering clone libraries: application to the mapping of the genome S. pombe.. Nucleic Acids Res. 21(8):1965-1974
MUTHUKRISHNAN, S., and L. PARIDA, 1997 Towards constructing physical maps by optical mapping: an effective, simple, combinatorial approach, pp. 209219 in Proceedings of the ACM Conference on Computational Molecular Biology. ACM Press, New York.
NELSON, D. O. and T. P. SPEED, 1994 Statistical issues in constructing high resolution physical maps. Stat. Sci. 9:334-354.
POLAK, E., 1997 Optimization: algorithms and consistent approximations. Appl. Math. Sci. 124:320-345.
PRADE, R. A., J. GRIFFITH, K. KOCHUT, J. ARNOLD, and W. E. TIMBERLAKE, 1997 In vitro reconstruction of the Aspergillus nidulans genome. Proc. Natl. Acad. Sci. USA 94:14564-14569
PRESS, W. H., B. P. FLANNERY, S. A. TEUKOLSKY and W. T. VETTERLING, 1988 Numerical Recipes in C. Cambridge University Press, New York.
ROMEO, F. and A. SANGIOVANNI-VINCENTELLI, 1991 A theoretical framework for simulated annealing. Algorithmica 6:302-345.
SHETE, S. S., 1998 Estimation problems in physical mapping of a chromosome and in a branching process with immigration. Ph.D. dissertation, Department of Statistics, The University of Georgia, Athens, GA.
SLONIM, D., L. KRUGLYAK, L. STEIN and E. LANDER, 1997 Building human genome maps with radiation hybrids, pp. 277286 in Proceedings of the ACM Conference on Computational Molecular Biology. ACM Press, New York.
STURTEVANT, A. H., 1913 The linear arrangement of six sex-linked factors in Drosophila as shown by their mode of association. J. Exp. Zool. 14:43-49.
SUNDERAM, V., 1990 PVM: a framework for parallel distributed computing. Concurrency: Practice Experience 2(2):315-339.
WANG, Y., R. A. PRADE, J. GRIFFITH, W. E. TIMBERLAKE, and J. ARNOLD, 1994a A fast random cost algorithm for physical mapping. Proc. Natl. Acad. Sci. USA 91:11094-11098
WANG, Y., R. A. PRADE, J. GRIFFITH, W. E. TIMBERLAKE, and J. ARNOLD, 1994b ODS-BOOTSTRAP: assessing the statistical reliability of physical maps by bootstrap resampling. Comput. Appl. Biosci. 10:625-635
WILSON, D. B., D. S. GREENBERG and C. A. PHILLIPS, 1997 Beyond islands: runs in clone-probe matrices, pp. 320329 in Proceedings of the ACM Conference on Computational Molecular Biology. ACM Press, New York.
WITTE, E. E., R. D. CHAMBERLAIN, and M. A. FRANKLIN, 1991 Parallel simulated annealing using speculative computation. IEEE Trans. Parallel Distributed Syst. 2(4):483-494.
WONG, C. P., and R. D. FIEBRICH, 1987 Simulated annealing-based circuit placement on the connection machine system, pp. 7882 in Proceedings of the International Conference on Computer Design. IEEE Press, Washington, DC.
XIONG, M., H. J. CHEN, R. A. PRADE, Y. WANG, and J. GRIFFITH et al., 1996 On the consistency of a physical mapping method to reconstruct a chromosome in vitro.. Genetics 142(1):267-284[Abstract].
ZHANG, M. Q. and T. G. MARR, 1993 Genome mapping by nonrandom anchoring: a discrete theoretical analysis. Proc. Natl. Acad. Sci. USA 90:600-604
This article has been cited by other articles:
![]() |
Z. Xu, B. Lance, C. Vargas, B. Arpinar, S. Bhandarkar, E. Kraemer, K. J. Kochut, J. A. Miller, J. R. Wagner, M. J. Weise, et al. Mapping by Sequencing the Pneumocystis Genome Using the Ordering DNA Sequences V3 Tool Genetics, April 1, 2003; 163(4): 1299 - 1313. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. S. Kelkar, J. Griffith, M. E. Case, S. F. Covert, R. D. Hall, C. H. Keith, J. S. Oliver, M. J. Orbach, M. S. Sachs, J. R. Wagner, et al. The Neurospora crassa Genome: Cosmid Libraries Sorted by Chromosome Genetics, March 1, 2001; 157(3): 979 - 990. [Abstract] [Full Text] |
||||
![]() |
V. Aign, U. Schulte, and J. D. Hoheisel Hybridization-Based Mapping of Neurospora crassa Linkage Groups II and V Genetics, March 1, 2001; 157(3): 1015 - 1020. [Abstract] [Full Text] |
||||
![]() |
D. Hall, S. M. Bhandarkar, and J. Wang ODS2: A Multiplatform Software Application for Creating Integrated Physical and Genetic Maps Genetics, March 1, 2001; 157(3): 1045 - 1056. [Abstract] [Full Text] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Supplemental data
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Bhandarkar, S. M.
- Articles by Kota, R. N.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Bhandarkar, S. M.
- Articles by Kota, R. N.






















































