A brief tutorial on PORTA

What does PORTA do?

PORTA is a software package for analyzing polyhedra. One problem it can solve is:

Problem: Find H-representation of the convex hull of some given points
Input: a set S of points in n-dimensional space
Output: a minimal inequality system defining conv(S).

Minor note: If conv(S) is not full dimensional, PORTA will report some equations too.

For a longer tutorial on PORTA and Polymake, see the slides by Marc Pfetsch.

Example: Independent Sets

Consider the 5-vertex cycle graph C_5 on vertex set {1,2,3,4,5}. We can list all of its stable/independent sets:

ALL_STABLE_SETS = { {}, {1}, {2}, {3}, {4}, {5}, {1,3}, {1,4}, {2,4}, {2,5}, {3,5} }.

We can write these stable sets in terms of their characteristic vectors:

0 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 1 0 0
1 0 0 1 0
0 1 0 1 0
0 1 0 0 1
0 0 1 0 1

Here, each ROW represents a stable set. We can give PORTA this collection ALL_STABLE_SETS, and it will return the following inequality description of conv(ALL_STABLE_SETS):

( 1) -x1 <= 0
( 2) -x2 <= 0
( 3) -x3 <= 0
( 4) -x4 <= 0
( 5) -x5 <= 0
( 6) +x4+x5 <= 1
( 7) +x3+x4 <= 1
( 8) +x2+x3 <= 1
( 9) +x1 +x5 <= 1
( 10) +x1+x2 <= 1
( 11) +x1+x2+x3+x4+x5 <= 2

Inequalities (1)–(5) are the nonnegativity bounds; (6)–(10) are the edge inequalities; (11) is an odd-hole inequality.

How do I use PORTA?

  1. Download version 1.4.1
  2. Create the .poi file* which contains all the points in S. See the formatting in the example file StableSet_5cycle.poi
  3. Open command prompt and navigate to the folder porta-1.4.1\win32\bin\
  4. Run PORTA on the .poi file. For example, put the .poi file in the folder porta-1.4.1\win32\bin\ and use the command traf.bat StableSet_5cycle.poi
  5. PORTA will show its progress to the command prompt window and then create an output file. In our case, the output file is StableSet_5cycle.poi.ieq. The command prompt window will look like this:cmd.png

(*) The .poi file can be time consuming to create by hand, and there are a few ways to create them automatically. In my case, I usually write some simple code that enumerates all 0-1 vectors and writes the feasible vectors to a file. In my experience, PORTA can only handle instances with roughly 20 dimensions, so this enumeration procedure is not the bottleneck.

I dislike big M’s and I cannot lie

Big M’s in integer programming formulations have been a topic of discussion on Twitter and the blogosphere in the past week. For example, see Marc-Andre Carle‘s posts (primal and dual) and followup tweets and also some older posts from Paul Rubin and Bob Fourer.

My initial reaction was don’t let the software handle it. If possible, try to reformulate the problem.

In this post, I explain myself and give supporting computational experiments.

Example: The uncapacitated facility location problem

Consider the uncapacitated facility location (UFL) problem. The task is to open a subset of n available facilities to service a set of m customers. The goal is to minimize the sum of facility opening costs and customer-delivery costs.

The parameters for this problem include the cost f_i to open facility i and the cost c_{ij} associated with fulfilling customer j‘s demand via facility i. We allow for a facility to fulfill a fractional amount of a customer’s demand, in which case the demand-fulfillment costs are proportional.

1. The disaggregated formulation

One natural formulation is as follows, where x_{ij} is the proportion of customer j‘s demand that is fulfilled by facility i, and y_i is a binary variable representing the decision to open facility i.

The objective is to minimize the total cost:

\min \sum_{i=1}^n \sum_{j=1}^m c_{ij} x_{ij} + \sum_{i=1}^n f_i y_i.

Each customer’s demand should be fulfilled:

\sum_{i=1}^n x_{ij}=1,~~j =1,\dots, m.

If a facility is closed, then it can fulfill none of a customer’s demand:

x_{ij} \le y_i,~~i=1,\dots, n,~j=1,\dots, m.

Finally, the x variables should be nonnegative, and the y variables should be binary.

2. The big-M formulation

In the big-M formulation, everything is the same, except that the x_{ij} \le y_i constraints are replaced by the constraints:

\sum_{j=1}^m x_{ij} \le M y_i.

To keep the formulation as strong as possible, the value used for M should be as small as possible. In this case, we should set M=m, which is also what you get when you aggregate the x_{ij} \le y_i constraints. (Carle’s post from above explores what happens when you use unnecessarily large M’s in situations like this.)

Which formulation is “better”?

Depending on who you ask, either of these formulations could be better. Many textbooks on integer programming prefer the first, since its LP relaxation is much stronger, meaning that the number of branch-and-bound nodes that will be explored is fewer. Results for 5 different benchmark instances with m=n=100 show that these bounds can indeed be much stronger.

LP 1 objective LP 2 objective
123unifS                 67,441                 12,271
623unifS                 70,006                 12,674
1823unifS                 69,649                 12,066
2623unifS                 68,776                 11,958
3023unifS                 71,639                 12,170

On the other hand, the big-M formulation has only O(n+m) constraints (if we exclude nonnegativity bounds), whereas the first formulation has \Omega(nm) constraints. So, the time to solve the LP relaxation for the big-M formulation is probably less. Indeed, the recent textbook by Conforti, Cornuéjols, and Zambelli states:

When the aggregated formulation is given to state-of-the-art solvers, they are able to detect and generate disaggregated constraints on the fly, whenever these constraints are violated by the current feasible solution. So, in fact, it is preferable to use the aggregated formulation because the size of the linear relaxation is much smaller and faster to solve.

However, MIP solvers rely on sparse matrix representation, meaning that the number of nonzeros in the formulation is more indicative of its size than the quantity obtained by multiplying the number of variables by the number of constraints. Indeed, the number of nonzeros to impose the x_{ij} \le y_i constraints is only about twice the number of nonzeros to impose the big-M constraints.

So, I was not exactly convinced that the time to solve the LP relaxations would be much different and decided to run some experiments. Here are the times (in seconds) to solve the LP relaxations for the same 5 benchmark instances in Gurobi 7.0.2.

Solve time for LP 1 Solve time for LP 2
123unifS                   0.115                   0.022
623unifS                   0.125                   0.023
1823unifS                   0.107                   0.022
2623unifS                   0.128                   0.022
3023unifS                   0.150                   0.024

While the LP times are indeed longer for the disaggregated formulation, it is only by a factor of 5 or 6, even though the number of constraints has increased by a factor of 50!

However, what we are actually interested in is the time to solve IP. By this measure, the disaggregated formulation wins though both times are reasonable for these sized instances. Experiments for larger instances are in the appendix.

Solve time for IP 1 Solve time for IP 2
123unifS                   3.496                   8.151
623unifS                   1.607                   7.176
1823unifS                   1.528                   4.468
2623unifS                   1.263                   3.415
3023unifS                   4.932                 14.153

Conclusion

I have only provided test results for 5 UFL instances, so I cannot make any general conclusions about the (dis)advantages of big M’s from this post. However, I can say that one should at least look for alternative formulations that do not rely on big M’s and then empirically compare them.

I leave with a rhetorical question. Where would the field of integer programming be if everyone concluded that the Miller-Tucker-Zemlin formulation for TSP was good enough (even though it has big M’s)? What if they said “just let the solver take care of it”?

Appendix

I constructed larger instances following the same procedures that were used when making the m=n=100 instances. Namely, each facility has an opening cost of 3000, and the assignment costs are uniformly distributed integers between 0 and 1000. (Actually, I used rand() % 1001 in C++, which is slightly biased to smaller numbers.)

The LP times are much like before.

Solve time for LP 1 Solve time for LP 2
n=100,m=150 0.437 0.057
n=100,m=200 0.506 0.076
n=150,m=100 0.422 0.070
n=200,m=100 0.641 0.083

So are the IP times.

Solve time for IP 1 Solve time for IP 2
n=100,m=150 178.049 225.599
n=100,m=200 420.491 501.478
n=150,m=100 51.808 153.065
n=200,m=100 110.433 143.192

Somewhat surprisingly, the IP time for formulation 2 decreased when increasing the number of facilities from 150 to 200. Probably the solver got lucky.

I have designated the x variables as continuous in the formulation, but there will always be an optimal solution in which they are binary. Given that there are more presolve procedures that apply to binary variables than continuous variables, it might help to declare them as binary. I tested the idea on a randomly constructed instance with m=n=100, but the continuous setting was better.

binary x continuous x
Solve time for IP 1 25.577 7.405
Solve time for IP 2 46.358 12.897

A brief tutorial on Gomory fractional cuts

Integer programs (IPs) are solved in practice by branch-and-cut algorithms. These algorithms rely heavily on the effective use of cutting planes, which are inequalities that are added to linear programming (LP) relaxations to cut off (bad) fractional points, but not the (good) integer feasible points. The classical cutting planes for solving arbitrary IPs were developed by Ralph Gomory in the late 1950s.

Prior to working on IP, Gomory earned his PhD in mathematics from Princeton in 1954, after which he spent a few years with the US Navy. In 1957, he went back to Princeton, but was retained as a consultant for the Navy. Gomory later recounted that his involvement with the Navy led him to IP:

As the Navy had kept me on as a consultant I continued to work on Navy problems through monthly trips to Washington. On one of these trips a group presented a linear programming model of a Navy Task Force. One of the presenters remarked that it would be nice to have whole number answers as 1.3 aircraft carriers, for example, was not directly usable.

Within the next few weeks, Gomory had developed his technique for generating cutting planes based on the simplex tableau. Soon after, he proved that his algorithm was finite and programmed it on the E101, allowing him to solve 5-variable problems reliably.

Example (from here).

Consider the following IP.

f1

Solving the LP relaxation gives the following system (with objective z).

f2.png

This corresponds to the fractional point (x_1,x_2,x_3,x_4)=(1.3,3.3,0,0).

A first cut

In any feasible solution to the IP, x_2 will take an integer value, so, from the second row, we can write 0.8x_3+0.1x_4=0.3+k for some integer k. Notice that the left side of this equation can only take nonnegative values, so the right side must as well, i.e.,  0.3 +k \ge 0 or k \ge  - 0.3. Since k is an integer, we know that k \ge 0. So, 0.8x_3 + 0.1x_4 = 0.3+k \ge 0.3. We have thus argued for the validity of the Gomory fractional cut:

0.8x_3 + 0.1x_4 \ge 0.3.

Note that this inequality cuts off our fractional point (1.3,3.3,0,0).

A second cut

Now consider the last row x_1 -0.2x_3 + 0.1x_4 =1.3. As before, we can write -0.2x_3+0.1x_4=0.3+k for some integer k. However, the left side of this equation may not always take nonnegative values (due to the negative coefficient for x_3), so our previous argument will not work. However, if we add x_3 to both sides, we can write 0.8x_3+0.1x_4=0.3+p for some (other) integer p. Using the same argument as before, we can generate the (same) Gomory fractional cut:

0.8x_3 + 0.1x_4 \ge 0.3.

Gomory fractional cuts in general

We can write the Gomory fractional cut in more general terms as follows. Suppose that nonnegative integers x_1,\dots, x_n satisfy the equation:

\sum_{i=1}^n a_i x_i = a_0,

where a_0 is fractional, i.e., a_0 \notin \mathbb{Z}. Think of this equation as a row of the simplex tableau/dictionary. The associated Gomory fractional cut is:

\sum_{i=1}^n (a_i -\lfloor a_i \rfloor) x_i \ge a_0 - \lfloor a_0\rfloor.

Proposition. The Gomory fractional cut is valid for the set X=\{ x \in \mathbb{Z}^n_+ ~|~\sum_{i=1}^n a_i x_i = a_0\}.

Proof. Let x^* \in X. We are to show that \sum_{i=1}^n (a_i -\lfloor a_i \rfloor) x^*_i \ge a_0 - \lfloor a_0\rfloor. By x^* \in X, we know that \sum_{i=1}^n a_i x^*_i = a_0. Since each \lfloor a_i\rfloor is integer, and since each x^*_i is integer, we can write:

\sum_{i=1}^n (a_i -\lfloor a_i \rfloor) x^*_i = a_0 - \lfloor a_0\rfloor +k,

for some integer k. Observe that the left side is nonnegative, so a_0 - \lfloor a_0\rfloor +k \ge 0 and thus k \ge \lfloor a_0\rfloor-a_0 > -1. By k \in \mathbb{Z}, this implies k\ge 0. In conclusion,

\sum_{i=1}^n (a_i -\lfloor a_i \rfloor) x^*_i = a_0 - \lfloor a_0\rfloor +k \ge a_0 - \lfloor a_0\rfloor. Q.E.D.

 

Two observations:

  1. Subtracting \lfloor a_i \rfloor from a_i in the Gomory fractional cut ensures that the left side is nonnegative, which was key to our arguments. We could have instead subtracted some q < \lfloor a_i \rfloor from a_i and the resulting inequality would remain valid; however, this would weaken the inequality.
  2. The Gomory fractional cut remains valid when you add other constraints to the set X. This means you can still use it when your IP is more complicated.

What makes an optimization problem difficult?

…there is always an easy solution to every human problem — neat, plausible, and wrong.
-H. L. Mencken, The Divine Afflatus, in A Mencken Chrestomathy

What makes an optimization problem difficult? I don’t really have the right answer, but here are some answers that are neat, plausible, and wrong.

 

Wrong answer #1: optimization problems are difficult when there are many feasible solutions.

Why not? This may be the most common misconception. The traveling salesman problem is (thought to be) a difficult problem, and it also has exponentially many feasible solutions. Sometimes this leads people to think that it is hard BECAUSE there are so many possible choices.  Of course the brute force algorithm will not work well in this case, but that’s not the only option. If this were the case, then the minimum spanning tree problem would be hard (there are n^{n-2} spanning trees in a complete graph on n vertices). Linear programs provide a more extreme example; they can have infinitely many feasible solutions. (Mike Trick also has a blog post on this.)

 

Wrong answer #2: optimization problems are difficult when there are many constraints.

Why not? There are many problems that have exponentially many constraints (or more!) but can be solved in polynomial time. This is the case for linear programs that admit efficient separation algorithms, where the ellipsoid method provides a (theoretically, but perhaps not practically) efficient algorithm. For example, consider optimizing over the subtour elimination relaxation for the traveling salesman problem. There are \Theta (2^n) subtour elimination constraints, but they can be separated in polynomial time using a min-cut algorithm. (Moreover, there are polynomial-size extended formulations for the subtour elimination polytopes.)

 

Wrong answer #3: optimization problems are difficult when there is no small linear program that represents its feasible solutions.

Why not? A counterexample to the claim is the matching problem. This problem is polynomial-time solvable, but its usual polyhedral representation admits no small linear programming formulation. Perhaps this belief is caused by a mistaken interpretation of the fact that linear programming is P-complete. Note that P-hardness is defined with respect to a more restrictive type of reduction than NP-hardness. The former is usually defined with respect to log-space reductions.

 

Wrong answer #4: optimization problems are difficult when there are many variables.

Why not? Consider a linear program that has exponentially many constraints, but that can be solved using the ellipsoid method and an efficient separation procedure. Its dual LP has exponentially many variables, but is polynomial-time solvable.

 

Wrong answer #5: optimization problems are difficult when the constraints/objective are nonlinear.

Why not? In the words of Tyrrell Rockafellar “the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity.” Indeed, the general class of convex optimization problems have nice properties that, in many cases, lead to efficient algorithms–even when they are nonlinear.

 

Wrong answer #6: optimization problems are difficult when they are nonconvex.

Why not? One counterexample to the claim is the problem of optimizing a quadratic function subject to an ellipsoid constraint. This problem is nonconvex, but is polynomial-time solvable.

 

 

Some of these wrong answers are appealing, partially because their converses are true. For example, the converses of wrong answers #2 and #4 are true for integer programs. Indeed, while integer programming is (thought to be) hard in general, it becomes polynomial-time solvable when we restrict ourselves to instances having a fixed number of variables (or with a fixed number of constraints).

(I might add that the converse of wrong answer #1 does not seem to hold. For example, it is thought to be hard to find a second Hamiltonian cycle in a particular class of graphs even when it is guaranteed to exist! Moreover, we can quickly check whether a number is prime, but actually factoring a composite number is (thought to be) hard for classical computers.)

The best answer I can give is that optimization problems are hard when they are sufficiently expressive*. This is essentially what we are showing when we prove that a problem is NP-hard—we are showing that we can express any problem from NP as an equivalent instance of our problem.

Of course, using NP-hardness as a justification for the hardness of a problem depends on NP actually having difficult problems in it. (I think it does.) At the very least, we can say that NP-hard optimization problems are “hard-ass”—Al Meyer’s creative abbreviation of “hard as satisfiability.”

___________

*Admittedly, this is sort of a non-answer. I think linear programming is pretty expressive, but it is polynomial-time solvable. So we should set the bar higher for being “sufficiently expressive.” If “sufficiently expressive” means “is able to express hard problems,” then this is a tautology. Any hard problem would be “sufficiently expressive” since it can be used to express instances of itself.

Short paths on one-way roads and Robbins’ theorem

Suppose our city is connected by a road network, and traffic is allowed to travel in both directions across a road. Our city is considering to make all of the roads one-way. Is there a way to pick a direction for each road and still be able to go between any two parts of the city? In other words, is there a way to orient the edges of the road network such that it is strongly connected? (Such an orientation is called a strong orientation.)

This question was answered back in 1939 by Robbins. Assuming our road network is connected, we only need to check to make sure that there is no bridge—not a civil engineer’s bridge, but a road that, when we remove it, the road network becomes disconnected. Having no bridge is an obvious necessary condition for admitting a strong orientation; if there is a bridge, then no matter which way we orient it, we’ll get stuck on one side of it. The other direction of Robbins’ theorem can be shown through an ear decomposition.

However, Robbins’ theorem does not ensure that you can quickly go from one part of the city to another in this one-way road network. For example, consider a cycle graph on n vertices. There is essentially only one strong orientation of a cycle graph. In it, some origin-destination pairs will have distance n-1, whereas in the two-way road network the travel time is at most n/2.

This leads to the question—When can the roads be made one-way so that we can quickly travel between every two points? More formally, when does a graph admit an orientation in which the pairwise distances are all at most k?

A necessary condition is that there be no “distance-k bridge”—an edge that belongs to every path of length at most k between two vertices. In other words, the subgraph obtained by removing any single edge should also have diameter at most k.

If this “no distance-k bridge” condition were also sufficient, then this would provide an elegant generalization of Robbins’ theorem. Alas, it is not sufficient.

For example, consider the complete graph on 4 vertices. If a single edge is removed, the remaining graph has diameter 2. However, no orientation of the 4-vertex complete graph has diameter at most 2. (Try it out!)

It turns out that the problem: “given an undirected graph, does it admit an orientation that has diameter at most k?” is NP-complete. The special case k=2 was shown to be NP-complete by Chvátal and Thomassen in 1978. This suggests that there is no “nice” analogue of Robbins’ theorem for diameter-constrained orientations.

I was NOT the guy behind that regrettable banner

If you google my name and my employer (also my undergrad alma mater), you’ll see something like this:

tot

Those first two links are about me. The last two are NOT.

The news articles tell the story of an undergraduate student attending Oklahoma State University (OSU) who created a banner with the insensitive hashtag #Trail_of_Tears. “Trail of Tears” refers to the forced relocations of various Native American nations in which thousands died. This student brought his regrettable banner to ESPN’s College GameDay Show for the Aug 30, 2014 football game against Florida State University (FSU). FSU’s athletic teams are known as the Florida State Seminoles, and the Seminole Tribe was one of tribes that was forcibly relocated. (Surprisingly, the Seminole Tribe has actually endorsed this nickname.)

Since OSU is my undergraduate alma mater, this has led several people to mistake this other Austin Buchanan as me (and even confront me about it!). I attended OSU from 2007 to 2011 and this incident occurred in 2014, so it should be clear that the guy behind the banner is not me. But not everyone reads the new stories in detail or knows when I attended OSU. Hence, the motivation for this post.

**Update** I wasn’t behind this one either 😀

A fallacious argument in integer programming

A commonly used argument in integer programming goes as follows. Suppose we want to show that inequality ax \le b does not induce a facet of a polyhedron. If we know valid inequalities (1) and (2) that add up to ax \le b, then we have shown that ax\le b cannot induce a facet. Right?

Not quite.

One problem occurs when inequalities (1) and (2) are multiples of ax\le b. For example, suppose that each inequality is (ax)/2 \le b/2.

What if the inequalities are not multiples of ax \le b?

Still no cigar.

Another problem occurs when the polyhedron is not full-dimensional. Consider the set of all (x,y)\in [0,1]^2 such that x\le 0 (meaning that we have an implicit equality constraint x=0). We want to argue that the inequality x+y\le 1 does NOT induce a facet. This seems to follow from the valid inequalities x\le 0 and y \le 1 which sum to x+y\le 1. However, the inequality x+y\le 1 DOES induce a facet; the dimension of our feasible region is one, the inequality x+y\le 1 is valid, and the face where x+y=1 has dimension zero.