Parallel MIP Optimizer

The ILOG CPLEX Parallel MIP Optimizer delivers significant increases in speed on a wide variety of models, particularly on difficult ones that solve a large number of nodes in the branch & cut search tree.

There are several different opportunities for applying multiple processors to the solution of a MIP problem.

Increase the Global Thread Parameter

The most straightforward way to invoke parallelism is by setting the global thread parameter, Threads, to a value greater than 1 to indicate the desired degree of parallelism. If the other parameters remain set to their default values, the result is that nodes in the branch & cut tree will be processed in parallel; that is, each node is solved in its entirety on one of the available processors, independently of the work being performed on other processors. In typical cases, the number of nodes waiting for solution quickly becomes greater than the number of threads, creating enough work which can be done in parallel to make speed increases possible. ILOG CPLEX automatically manages the pool of nodes, so that each time a thread finishes one node, it is assigned to solving another.

Branch & Cut Parallel Processing

A contrasting and specialized approach to obtaining speed increases by parallel processing within the branch & cut tree is to:

  1. choose strong branching (VarSel parameter setting 3) as the variable selection strategy;
  2. apply multiple processors to the strong branching variable selection computation by setting the strong branching thread limit, StrongThreadLim, to a value greater than 1; and
  3. leaving the global thread limit at 1 to inhibit processing the branching and solution of the nodes in parallel.

On models where strong branching is already a beneficial technique, this approach to parallelism can be especially attractive.

Root Relaxation Parallel Processing

In some models, the LP or QP root relaxation takes significant solution time before parallel branching begins. These models have potential for additional speed increases by running the root relaxation step in parallel. If the root problem is an LP and the Threads parameter is set to a value greater than 1, the concurrent optimizer is invoked by default. This provides a form of parallelism that applies the available threads to multiple LP optimizers. If the root problem is a QP, the dual simplex optimizer alone is used.

You can try a different form of parallelism at the root by selecting a specific optimizer with the starting algorithm parameter (RootAlg in Concert, CPX_PARAM_STARTALG in the Callable Library, "set mip strategy startalgorithm" in the Interactive Optimizer). The parallel threads will all be applied to the selected starting algorithm, to the extent that the algorithm supports parallelism. Parallel Barrier in particular is a frequent choice for this purpose. Later sections contain information on the Parallel Barrier Optimizer and the Parallel Simplex Optimizer. Note that if you use the barrier optimizer at the root of an MIQP problem, you must also use it for the nodes. This restriction does not apply in the case of MILP.

Individual Optimizer Parallel Processing

Parallelism in either barrier or simplex is ordinarily controlled by the global thread count parameter, but this default behavior can be overridden by the individual optimizer thread limit parameters, BarThreads and SimThreads respectively. The degree of parallelism within the branch & cut tree likewise can be controlled by the MIP thread limit MipThreads, which overrides the global thread limit. This capability to set any or all of MipThreads, BarThreads, and SimThreads, permits great flexibility in the use of parallel resources during the solution of a MIP model.

For example, on a model where only a small number of nodes is active at any one time, the benefit of parallel solution of these nodes is limited. If the individual nodes are computationally challenging, then it may be possible to achieve speed increases by leaving the global thread limit at its default of 1 and setting the continuous optimizer thread limit (SimThreads or BarThreads) to a value greater than 1. The global thread limit of 1 will inhibit the parallelism of the branching process, while the explicit thread limit of more than 1 will permit the optimization of each node in parallel.

Nested Parallel Processing

Nested parallelism represents a further way to exploit the flexibility of independent thread parameters. For example it might be determined from experience, on a given family of models, that only a modest degree of parallelism is beneficial at the nodes and additional processors do not help speed up the branching. In such a case, better speed increases might be obtained by combining a parallelization of the work that the LP or QP optimizer does at each node. On an 8-processor computer, you might opt to solve a model by setting the MipThreads limit to 4 instead of its maximum of 8, and the BarThreads limit to 2, thus keeping all 8 processors busy as much of the time as possible, with the four MIP threads each invoking two threads for the simplex optimizer.

If you do decide to try a nested parallel approach, keep in mind the rule of thumb that it is usually better to keep a higher degree of parallelism of the nodes themselves (MipThreads) than of the LP or QP optimizer (SimThreads or Barthreads); this is in keeping with the general observation that MIP speed increases are on average closer to linear in the number of threads than the speed increases for the continuous optimizers.

Nested parallelism is not supported by most OpenMP compilers and runtime libraries, so it is not available with the OpenMP version of ILOG CPLEX. On the OpenMP version you may set MipThreads to a value greater than 1, or either of SimThreads or BarThreads to a value greater than 1, but not both, in a single call to the MIP optimizer.

Memory Considerations and the Parallel MIP Optimizer

Before the parallel MIP optimizer invokes parallel processing, it makes separate, internal copies of the initial problem. The individual processors use these copies during computation, so each of them requires an amount of memory roughly equal to the size of the original model after it is presolved.

Output from the Parallel MIP Optimizer

The parallel MIP optimizer generates slightly different output in the node log (see Progress Reports: Interpreting the Node Log) from the serial MIP optimizer. The following paragraphs explain those differences.

If the MIP optimizer is running in parallel, it will display elapsed time for the MIP optimizer in wall-clock time, independent of the setting of the clock-type parameter (assuming MIP logging has not been turned off).

ILOG CPLEX prints a summary of timing statistics specific to the parallel MIP optimizer at the end of optimization. You can see typical timing statistics in the following sample run.

Problem 'fixnet6.mps.gz' read.

Read time = 0.08 sec.

CPLEX> o

Tried aggregator 1 time.

MIP Presolve modified 308 coefficients.

Aggregator did 1 substitutions.

Reduced MIP has 477 rows, 877 columns, and 1754 nonzeros.

Presolve time = 0.02 sec.

Clique table members: 2

MIP emphasis: balance optimality and feasibility

Root relaxation solution time = 0.03 sec.

Nodes Cuts/

Node Left Objective IInf Best Integer Best Node ItCnt Gap

0 0 3192.0420 12 3192.0420 308

* 0+ 0 0 4505.0000 3192.0420 308 29.14%

3263.9220 19 4505.0000 Cuts: 36 344 27.55%

3393.0917 17 4505.0000 Cuts: 24 404 24.68%

* 0+ 0 0 4419.0000 3393.0917 404 23.22%

3439.4758 19 4419.0000 Flowcuts: 7 446 22.17%

3472.6583 20 4419.0000 Flowcuts: 6 462 21.42%

3489.9293 19 4419.0000 Flowcuts: 5 488 21.02%

3498.4754 26 4419.0000 Flowcuts: 5 506 20.83%

3499.7388 26 4419.0000 Flowcuts: 5 521 20.80%

3500.8733 26 4419.0000 Flowcuts: 3 531 20.78%

3526.2044 19 4419.0000 Flowcuts: 4 547 20.20%

3526.3698 21 4419.0000 Flowcuts: 1 548 20.20%

* 0+ 0 0 4310.0000 3526.3698 548 18.18%

* 0+ 0 0 3985.0000 3526.3698 548 11.51%

* 33 2 0 3983.0000 3791.7448 1130 4.80%

Sequential (before b&c):

Real time = 0.55

Parallel b&c, 2 threads:

Real time = 0.16

Critical time (total) = 0.00

Spin time (average) = 0.02

-------

Total (sequential+parallel) = 0.71 sec.

Cover cuts applied: 1

Flow cuts applied: 45

Gomory fractional cuts applied: 6

Integer optimal solution: Objective = 3.9830000000e+03

Solution time = 0.72 sec. Iterations = 1187 Nodes = 41

The summary at the end of the sample tells us that 0.55 seconds were spent in the sequential phase, that is, all the combined steps (preprocessing, root relaxation solution, cutting planes, heuristic) that occur at the root before the first branch occurs. The parallel part of this sample run took 0.16 seconds of real time (that is, elapsed time for that phase).

Other parts of the sample report indicate that the processors spent an average of 0.02 seconds of real time spinning (that is, waiting for work while there were too few active nodes available). The real critical time was a total of 0.00 seconds, time spent by individual processors in updating global, shared information. Since only one processor can access the critical region at any instant in time, the amount of time spent in this region really is crucial: any other processor that tries to access this region must wait, thus sitting idle, and this idle time is counted separately from the spin time.

There is another difference in the way logging occurs in the parallel MIP optimizer. When this optimizer is called, it makes a number of copies of the problem. These copies are known as clones. The parallel MIP optimizer creates as many clones as there are threads available to it. When the optimizer exits, these clones and the memory they used are discarded.

If a log file is active when the clones are created, then ILOG CPLEX creates a clone log file for each clone. The clone log files are named cloneK.log, where K is the index of the clone, ranging from 0 (zero) to the number of threads minus one. Since the clones are created at each call to the parallel MIP optimizer and discarded when it exits, the clone logs are opened at each call and closed at each exit. (The clone log files are not removed when the clones themselves are discarded.)

The clone logs contain information normally recorded in the ordinary log file (by default, cplex.log) but inconvenient to send through the normal log channel. The information likely to be of most interest to you are special messages, such as error messages, that result from calls to the LP optimizers called for the subproblems.


Previous Page: Using Parallel Optimizers in the ILOG  CPLEX Component Libraries  Return to Top Next Page: Parallel Barrier Optimizer