I dislike big M’s and I cannot lie

Big M’s in integer programming formulations have been a topic of discussion on Twitter and the blogosphere in the past week. For example, see Marc-Andre Carle‘s posts (primal and dual) and followup tweets and also some older posts from Paul Rubin and Bob Fourer.

Johan Lofberg calls it "Just Big Enough M" – not as dangerous. Better yet, let software (CPLEX, YALMIP, whatever) handle it.

— Mark L. Stone (@themarklstone) February 22, 2017

My initial reaction was don’t let the software handle it. If possible, try to reformulate the problem.

Or reformulate to where big-M isn't needed, e.g., for uncapacitated facility location https://t.co/1ZcDmUs5FW

— Austin Buchanan (@AustinLBuchanan) February 22, 2017

In this post, I explain myself and give supporting computational experiments.

Example: The uncapacitated facility location problem

Consider the uncapacitated facility location (UFL) problem. The task is to open a subset of $n$ available facilities to service a set of $m$ customers. The goal is to minimize the sum of facility opening costs and customer-delivery costs.

The parameters for this problem include the cost $f_i$ to open facility $i$ and the cost $c_{ij}$ associated with fulfilling customer $j$ ‘s demand via facility $i$ . We allow for a facility to fulfill a fractional amount of a customer’s demand, in which case the demand-fulfillment costs are proportional.

1. The disaggregated formulation

One natural formulation is as follows, where $x_{ij}$ is the proportion of customer $j$ ‘s demand that is fulfilled by facility $i$ , and $y_i$ is a binary variable representing the decision to open facility $i$ .

The objective is to minimize the total cost:

$\min \sum_{i=1}^n \sum_{j=1}^m c_{ij} x_{ij} + \sum_{i=1}^n f_i y_i.$

Each customer’s demand should be fulfilled:

$\sum_{i=1}^n x_{ij}=1,~~j =1,\dots, m.$

If a facility is closed, then it can fulfill none of a customer’s demand:

$x_{ij} \le y_i,~~i=1,\dots, n,~j=1,\dots, m.$

Finally, the $x$ variables should be nonnegative, and the $y$ variables should be binary.

2. The big-M formulation

In the big-M formulation, everything is the same, except that the $x_{ij} \le y_i$ constraints are replaced by the constraints:

$\sum_{j=1}^m x_{ij} \le M y_i.$

To keep the formulation as strong as possible, the value used for M should be as small as possible. In this case, we should set $M=m$ , which is also what you get when you aggregate the $x_{ij} \le y_i$ constraints. (Carle’s post from above explores what happens when you use unnecessarily large M’s in situations like this.)

Which formulation is “better”?

Depending on who you ask, either of these formulations could be better. Many textbooks on integer programming prefer the first, since its LP relaxation is much stronger, meaning that the number of branch-and-bound nodes that will be explored is fewer. Results for 5 different benchmark instances with $m=n=100$ show that these bounds can indeed be much stronger.

	LP 1 objective	LP 2 objective
123unifS	67,441	12,271
623unifS	70,006	12,674
1823unifS	69,649	12,066
2623unifS	68,776	11,958
3023unifS	71,639	12,170

On the other hand, the big-M formulation has only $O(n+m)$ constraints (if we exclude nonnegativity bounds), whereas the first formulation has $\Omega(nm)$ constraints. So, the time to solve the LP relaxation for the big-M formulation is probably less. Indeed, the recent textbook by Conforti, Cornuéjols, and Zambelli states:

When the aggregated formulation is given to state-of-the-art solvers, they are able to detect and generate disaggregated constraints on the fly, whenever these constraints are violated by the current feasible solution. So, in fact, it is preferable to use the aggregated formulation because the size of the linear relaxation is much smaller and faster to solve.

However, MIP solvers rely on sparse matrix representation, meaning that the number of nonzeros in the formulation is more indicative of its size than the quantity obtained by multiplying the number of variables by the number of constraints. Indeed, the number of nonzeros to impose the $x_{ij} \le y_i$ constraints is only about twice the number of nonzeros to impose the big-M constraints.

So, I was not exactly convinced that the time to solve the LP relaxations would be much different and decided to run some experiments. Here are the times (in seconds) to solve the LP relaxations for the same 5 benchmark instances in Gurobi 7.0.2.

	Solve time for LP 1	Solve time for LP 2
123unifS	0.115	0.022
623unifS	0.125	0.023
1823unifS	0.107	0.022
2623unifS	0.128	0.022
3023unifS	0.150	0.024

While the LP times are indeed longer for the disaggregated formulation, it is only by a factor of 5 or 6, even though the number of constraints has increased by a factor of 50!

However, what we are actually interested in is the time to solve IP. By this measure, the disaggregated formulation wins though both times are reasonable for these sized instances. Experiments for larger instances are in the appendix.

	Solve time for IP 1	Solve time for IP 2
123unifS	3.496	8.151
623unifS	1.607	7.176
1823unifS	1.528	4.468
2623unifS	1.263	3.415
3023unifS	4.932	14.153

Conclusion

I have only provided test results for 5 UFL instances, so I cannot make any general conclusions about the (dis)advantages of big M’s from this post. However, I can say that one should at least look for alternative formulations that do not rely on big M’s and then empirically compare them.

I leave with a rhetorical question. Where would the field of integer programming be if everyone concluded that the Miller-Tucker-Zemlin formulation for TSP was good enough (even though it has big M’s)? What if they said “just let the solver take care of it”?

Appendix

I constructed larger instances following the same procedures that were used when making the $m=n=100$ instances. Namely, each facility has an opening cost of 3000, and the assignment costs are uniformly distributed integers between 0 and 1000. (Actually, I used rand() % 1001 in C++, which is slightly biased to smaller numbers.)

The LP times are much like before.

	Solve time for LP 1	Solve time for LP 2
n=100,m=150	0.437	0.057
n=100,m=200	0.506	0.076
n=150,m=100	0.422	0.070
n=200,m=100	0.641	0.083

So are the IP times.

	Solve time for IP 1	Solve time for IP 2
n=100,m=150	178.049	225.599
n=100,m=200	420.491	501.478
n=150,m=100	51.808	153.065
n=200,m=100	110.433	143.192

Somewhat surprisingly, the IP time for formulation 2 decreased when increasing the number of facilities from 150 to 200. Probably the solver got lucky.

I have designated the $x$ variables as continuous in the formulation, but there will always be an optimal solution in which they are binary. Given that there are more presolve procedures that apply to binary variables than continuous variables, it might help to declare them as binary. I tested the idea on a randomly constructed instance with $m=n=100$ , but the continuous setting was better.

	binary $x$	continuous $x$
Solve time for IP 1	25.577	7.405
Solve time for IP 2	46.358	12.897

3 thoughts on “I dislike big M’s and I cannot lie”

Pierre Bonami March 3, 2017 at 9:55 am

I agree with your conclusion that one should always look for alternative formulations (and actually always in integer programming) but it also always depends on the underlying algorithm used.

It would be interesting to know what the comparison would look like if you use a Benders algorithm (like the one in the latest CPLEX). I don’t know the answer, but it might be different.

Reply ↓
Paul A. Rubin March 3, 2017 at 8:44 pm

At a conference workshop shortly after the introduction of indicator constraints in CPLEX, an IBM presenter (responding to a question, I think) said that if you knew a tight values for M, you would probably be better off using it than using indicators (and essentially relying on CPLEX to derive a value for M). I don’t know if that advice still holds, but I’m inclined to think so.

Reply ↓
Pingback: Big M: good in practice, bad in theory, and ugly numerically! | Thiago Serra's website

Farkas' Dilemma

I dislike big M’s and I cannot lie