Big M’s in integer programming formulations have been a topic of discussion on Twitter and the blogosphere in the past week. For example, see MarcAndre Carle‘s posts (primal and dual) and followup tweets and also some older posts from Paul Rubin and Bob Fourer.
My initial reaction was don’t let the software handle it. If possible, try to reformulate the problem.
In this post, I explain myself and give supporting computational experiments.
Example: The uncapacitated facility location problem
Consider the uncapacitated facility location (UFL) problem. The task is to open a subset of available facilities to service a set of customers. The goal is to minimize the sum of facility opening costs and customerdelivery costs.
The parameters for this problem include the cost to open facility and the cost associated with fulfilling customer ‘s demand via facility . We allow for a facility to fulfill a fractional amount of a customer’s demand, in which case the demandfulfillment costs are proportional.
1. The disaggregated formulation
One natural formulation is as follows, where is the proportion of customer ‘s demand that is fulfilled by facility , and is a binary variable representing the decision to open facility .
The objective is to minimize the total cost:
Each customer’s demand should be fulfilled:
If a facility is closed, then it can fulfill none of a customer’s demand:
Finally, the variables should be nonnegative, and the variables should be binary.
2. The bigM formulation
In the bigM formulation, everything is the same, except that the constraints are replaced by the constraints:
To keep the formulation as strong as possible, the value used for M should be as small as possible. In this case, we should set , which is also what you get when you aggregate the constraints. (Carle’s post from above explores what happens when you use unnecessarily large M’s in situations like this.)
Which formulation is “better”?
Depending on who you ask, either of these formulations could be better. Many textbooks on integer programming prefer the first, since its LP relaxation is much stronger, meaning that the number of branchandbound nodes that will be explored is fewer. Results for 5 different benchmark instances with show that these bounds can indeed be much stronger.

LP 1 objective 
LP 2 objective 
123unifS 
67,441 
12,271 
623unifS 
70,006 
12,674 
1823unifS 
69,649 
12,066 
2623unifS 
68,776 
11,958 
3023unifS 
71,639 
12,170 
On the other hand, the bigM formulation has only constraints (if we exclude nonnegativity bounds), whereas the first formulation has constraints. So, the time to solve the LP relaxation for the bigM formulation is probably less. Indeed, the recent textbook by Conforti, Cornuéjols, and Zambelli states:
When the aggregated formulation is given to stateoftheart solvers, they are able to detect and generate disaggregated constraints on the fly, whenever these constraints are violated by the current feasible solution. So, in fact, it is preferable to use the aggregated formulation because the size of the linear relaxation is much smaller and faster to solve.
However, MIP solvers rely on sparse matrix representation, meaning that the number of nonzeros in the formulation is more indicative of its size than the quantity obtained by multiplying the number of variables by the number of constraints. Indeed, the number of nonzeros to impose the constraints is only about twice the number of nonzeros to impose the bigM constraints.
So, I was not exactly convinced that the time to solve the LP relaxations would be much different and decided to run some experiments. Here are the times (in seconds) to solve the LP relaxations for the same 5 benchmark instances in Gurobi 7.0.2.

Solve time for LP 1 
Solve time for LP 2 
123unifS 
0.115 
0.022 
623unifS 
0.125 
0.023 
1823unifS 
0.107 
0.022 
2623unifS 
0.128 
0.022 
3023unifS 
0.150 
0.024 
While the LP times are indeed longer for the disaggregated formulation, it is only by a factor of 5 or 6, even though the number of constraints has increased by a factor of 50!
However, what we are actually interested in is the time to solve IP. By this measure, the disaggregated formulation wins though both times are reasonable for these sized instances. Experiments for larger instances are in the appendix.

Solve time for IP 1 
Solve time for IP 2 
123unifS 
3.496 
8.151 
623unifS 
1.607 
7.176 
1823unifS 
1.528 
4.468 
2623unifS 
1.263 
3.415 
3023unifS 
4.932 
14.153 
Conclusion
I have only provided test results for 5 UFL instances, so I cannot make any general conclusions about the (dis)advantages of big M’s from this post. However, I can say that one should at least look for alternative formulations that do not rely on big M’s and then empirically compare them.
I leave with a rhetorical question. Where would the field of integer programming be if everyone concluded that the MillerTuckerZemlin formulation for TSP was good enough (even though it has big M’s)? What if they said “just let the solver take care of it”?
Appendix
I constructed larger instances following the same procedures that were used when making the instances. Namely, each facility has an opening cost of 3000, and the assignment costs are uniformly distributed integers between 0 and 1000. (Actually, I used rand() % 1001 in C++, which is slightly biased to smaller numbers.)
The LP times are much like before.

Solve time for LP 1 
Solve time for LP 2 
n=100,m=150 
0.437 
0.057 
n=100,m=200 
0.506 
0.076 
n=150,m=100 
0.422 
0.070 
n=200,m=100 
0.641 
0.083 
So are the IP times.

Solve time for IP 1 
Solve time for IP 2 
n=100,m=150 
178.049 
225.599 
n=100,m=200 
420.491 
501.478 
n=150,m=100 
51.808 
153.065 
n=200,m=100 
110.433 
143.192 
Somewhat surprisingly, the IP time for formulation 2 decreased when increasing the number of facilities from 150 to 200. Probably the solver got lucky.
I have designated the variables as continuous in the formulation, but there will always be an optimal solution in which they are binary. Given that there are more presolve procedures that apply to binary variables than continuous variables, it might help to declare them as binary. I tested the idea on a randomly constructed instance with , but the continuous setting was better.

binary 
continuous 
Solve time for IP 1 
25.577 
7.405 
Solve time for IP 2 
46.358 
12.897 