Can my state redistrict at the county level?

Gerrymandering is a problem in the USA, and has led to notable Supreme Court cases in recent years. Examples include:

To deal with these issues, reformers have sought to change who controls the redistricting process, or what rules redistricting plans should follow. Constraints on traditional redistricting principles (e.g., population balance, contiguity, compactness, preservation of counties or other political subdivisions) are thought to prevent the most egregious of gerrymanders.

But, at what point do these constraints make redistricting impossible?

Here, we consider one of the most basic questions in redistricting: Can my state draw its congressional districts so that:

  1. all districts have roughly the “same” population;
  2. each district is contiguous on the map; and
  3. no county is split between districts?

We will use mixed integer programming (MIP) to answer this question. MIPs are a type of mathematical optimization problem.

The Input Data

To define the problem, we need some input data:

  • k : the number of districts to draw
  • L : the fewest number of people allowable in each district
  • U : the most number of people allowable in each district
  • p(i) : the number of people in county i
  • V : the set of counties
  • E : the pairs of counties that are adjacent
  • G=(V,E) : the contiguity graph (or “network”)

For example, Oklahoma has 77 counties that should be partitioned into k=5 congressional districts. After the 2010 census, Oklahoma’s population was 3,751,351, so each district should have roughly 750,270.2 people in it. We will allow a little flexibility and require each district to contain between L=746,519 and U=754,021. This ensures that the most and least populated districts will differ by at most 1%. The total number of county adjacencies is 195. An example is Oklahoma County and Cleveland County.

The data for all 50 states can be found here.

An Initial MIP

We start with a classical optimization model that goes back to the 1960s. It will enforce the most basic constraints in redistricting. It uses decision variables of the form:

x(i,j) = should county i be assigned to the district centered at county j?

If so, then x(i,j) should equal one. Otherwise, it will be zero. The MIP is given by:

Constraints (1b) ensure that each county i is assigned to one district. Constraint (1c) ensures that there will be k districts. Constraints (1d) ensure that each district has population between L and U. Constraints (1e) ensure that counties can only be assigned to a district that exists. Constraints (1f) ensure either that county i is assigned to county j or that it is not. The objective function (1a) that is being minimized seeks “compact” districting plans. Ultimately, this will be unimportant for us, as we’re only interested in determining whether there is a feasible districting plan, not whether it is “compact”.

For computational niceties, we require each district to be “centered” at its most populous county. This can be achieved by fixing x(i,j)=0 when county i is more populous than county j. This changes the objective value of model (1), but not its feasibility status.

Initial Results

We see that most states cannot redistrict at the county level. A primary reason is large cities. For example, Dallas County in Texas had a population of 2,368,139 in 2010, which far exceeds the bound U that we imposed. We call such instances “overtly infeasible”.

There are also seven states that are “trivial” in the sense that only one congressional district should be created and it must cover the whole state. There is nothing to solve.

There are 16 states left. When running the code, we see some problems. For example, here is a solution for Oklahoma:

Obviously, this districting plan lacks contiguity. This should not be a surprise; we never imposed contiguity constraints in our MIP!

Adding Contiguity Constraints

There are many ways to add contiguity constraints to our MIP. In the example python code, we use the simplest approach, which is based on flow techniques. Intuitively, the idea is to create flow at each district’s center and send it along the district’s edges to fuel the district’s other nodes. For the details, see model (2) here.

With these constraints, our issue for Oklahoma has been fixed.

Our computational results tell us that there is no solution for Colorado, New Hampshire, Oregon, or South Carolina.

By looking at the Denver and Portland metros below, can you see why?

If you’re looking for a harder puzzle, see if you can figure out why South Carolina cannot redistrict at the county level.

The other twelve states *do* admit contiguous county-level solutions: Alabama, Arkansas, Iowa, Idaho, Kansas, Louisiana, Maine, Mississippi, Nebraska, New Mexico, Oklahoma, and West Virginia. Try to draw some of them for yourself at


Only two states—Iowa and West Virginia—drew county-level maps after the 2010 census (if we exclude states with one congressional district). Based on the analysis conducted here, at most 10 other states could follow.

I emphasize “at most”, because our analysis does not consider all federal and state laws. For example, Section 2 of the Voting Rights Act (as interpreted by the Supreme Court in Thornburg v. Gingles) requires the creation of minority opportunity (or “majority-minority”) districts in certain cases. Preliminary analysis suggests that, when this constraint is considered, Alabama cannot redistrict at the county level.

Another caveat is that not all states have “counties” (e.g., Louisiana has parishes) or the counties have no real significance (e.g., Massachusetts abolished eight of its county governments by 2000).

Further Reading

My coauthors Hamidreza Validi, Eugene Lykhovyd, and I wrote a paper about handling contiguity constraints in redistricting problems when the units are census tracts instead of counties. These instances are much larger than the county-level instances considered here and require alternative MIPs and advanced techniques, like branch-and-cut algorithms and Lagrangian-based reduced-cost fixing, to solve in a reasonable amount of time. Here are links to that paper and code.

Why is maximum clique often easy in practice?

Recently, my coauthor Jose L. Walteros and I published a paper with the title “Why is maximum clique often easy in practice?” (forthcoming at Operations Research). Here is the abstract:

To this day, the maximum clique problem remains a computationally challenging problem. Indeed, despite researchers’ best efforts, there exist unsolved benchmark instances with one thousand vertices. However, relatively simple algorithms solve real-life instances with millions of vertices in a few seconds. Why is this the case? Why is the problem apparently so easy in many naturally occurring networks? In this paper, we provide an explanation. First, we observe that the graph’s clique number \omega is very near to the graph’s degeneracy d in most real-life instances. This observation motivates a main contribution of this paper, which is an algorithm for the maximum clique problem that runs in time polynomial in the size of the graph, but exponential in the gap g:=(d+1)-\omega between the clique number \omega and its degeneracy-based upper bound d+1. When this gap g can be treated as a constant, as is often the case for real-life graphs, the proposed algorithm runs in time O(dm)=O(m^{1.5}). This provides a rigorous explanation for the apparent easiness of these instances despite the intractability of the problem in the worst case. Further, our implementation of the proposed algorithm is actually practical—competitive with the best approaches from the literature.

The code is available here. A preprint of the paper is here. A slide deck is here.

Last Friday, I presented the work to a group at OSU. Here is the handout I provided to students.

A brief tutorial on Lagrangian techniques for k-median

Here are my notes on Lagrangian techniques for the k-median problem from a talk I gave at OSU today. They cover:

  • A Lagrangian relaxation model for k-median
  • How to solve the inner problem (reduce it to sorting)
  • Pseudocode for Lagrangian-based branch-and-bound
  • Experimental results

The GitHub repository has the supporting code. For more info, see the references at the end of the notes.

I’ve become picky about graph notation

A couple of years ago, I came across the following passage about graph notation.

…Martin Grötschel will always advocate a thorough investigation with a careful look at details, which can or cannot make a big difference. In particular, when it comes to graphs, digraphs, and hypergraphs, he becomes furious about “sloppy notation” that doesn’t distinguish between uv, (u,v), and {u,v}, because results do not automatically carry over between these cases. Martin Grötschel is correct, and the ability to seamlessly shift his attention from little details to the grand picture is without doubt one of his greatest strengths.


In many research papers, I’ve seen authors write about a simple undirected graph G=(V,E) with vertex set V and edge set E\subset V \times V and proceed to talk about its edges (u,v). Sometimes, papers will even say that V \times V is the edge set of a complete (simple) graph. I know, because I was guilty of some of these mistakes in my first few papers.

Here are a few of the problems.

simple undirected graph

What’s the problem here? Simple graphs are, by definition, undirected. There is no need to specify “undirected.”

edge set E\subseteq V \times V

What about here? Undirected graphs have undirected edges \{u,v\} which are unordered pairs of vertices (so \{u,v\} and \{v,u\} refer to the same edge), while the Cartesian product of sets A and B is ordered:

A \times B:=\{ (a,b) ~|~a \in A,~b \in B\}.

So, when G=(V,E) is a directed graph, it makes sense to say that E \subset V \times V because it may have edge (u,v) but not edge (v,u). But where does the edge set E live when G is undirected? The following approach is unsightly and not conducive to frequent use:

E \subseteq \{\{u,v\} ~|~u \in V,~v \in V,~u\neq v\}.

Instead, I have started writing that E \subseteq \binom{V}{2}, where

\binom{V}{2}:=\{\{u,v\} ~|~u\in V,~v \in V,~u\neq v\}.

The idea easily generalizes to \binom{V}{k} for hypergraphs with k vertices per hyperedge.

V \times V is the edge set of a complete (simple) graph

By now, the problems with this statement should be clear. One nice thing about the notation \binom{V}{2} is that when the graph is simple and complete we can just say that E=\binom{V}{2}.

A brief tutorial on Fourier-Motzkin Elimination

Let’s start with an analogy.

Equality systems : Gaussian elimination :: Inequality systems : ???

According to Kipp Martin, the answer is Fourier-Motzkin Elimination (FME) which allows one to project out variables from a system of linear inequalities. These slides from Thomas Rothvoss nicely illustrate what it means to project out (or eliminate) the variable y.

Note: I don’t particularly like how Gaussian elimination is described (via row operations on a tableau, much like how the simplex method is often described). The intuition is lacking. Instead, everything I describe in this post will work on the (in)equalities themselves.

Eliminating variables in equality systems

Start with the linear equality system


Proceed by isolating the variable z in each equation that contains z to get


We can eliminate variable z by setting all “definitions” of z equal to each other (and leaving other equations as is)


Simplifying gives


By isolating and then eliminating variable y, we get x=2. Back-substitution gives


So the only feasible solution is (x,y,z)=(2,2,-2). Observe that each time a variable is eliminated, one equality is also removed.

Eliminating variables in inequality systems

Suppose we started with the inequality system

2x+y+z\le 4
x+2y+z\ge 4
x+y\le 4.

Isolating z (where possible) gives

z\le 4-2x-y
z\ge 4-x-2y
x+y\le 4.

Matching each of the lower bounds on z with each of its upper bounds (and leaving other inequalities as is), we obtain

4-x-2y\le 4-2x-y
x+y\le 4.

Simplifying gives

x \le y
x+y\le 4.

Isolating y gives

y \ge x
y\le 4-x.

Applying the same procedure to eliminate y, we are left with

x \le 2.

This is the projection of the original feasible set onto the space of the x variable. That is,

  1. for any x\le 2 there exist values for y and z for which the original system is satisfied, and
  2. any point that is feasible for the original system satisfies x\le 2.

Consequences of Fourier-Motzkin Elimination (FME)

FME can test whether a linear program is feasible.

If the system is feasible, FME can provide a feasible solution by employing back-substitution (in accordance with the lower and upper bounds on the variable that is being eliminated). On the other hand, if the original system is infeasible, the final inequality system will have conflicting lower and upper bounds on the final variable such as x \le 0 and x \ge 11 or obvious contradictions like 7 \le 2.

FME can optimize linear programs.

To optimize an LP in which we are to maximize c^T x, simply add a variable z and constraint z\le c^T x to our system and make sure to eliminate z last. The optimality status (e.g., infeasible, unbounded, an optimal solution) can be obtained from the final inequality system. (How?) Note, however, that Fourier-Motzkin elimination is awfully slow (doubly exponential) and is not a practical means to solve LPs. (But, there is also a singly exponential elimination method.)

Projections of polyhedra are also polyhedra

If we eliminate a variable from a system with m inequalities, then we end up with a system of at most m^2/4 inequalities. So, if we start with a finite number of inequalities and variables, then any projection will have finitely many inequalities. (FME may give redundant inequalities, which is okay.)

Projections of rational polyhedra are also rational

If every coefficient and right-hand-side in our original inequality system is rational, then they will remain rational after a variable is eliminated.

Other results

More consequences follow by FME, see Kipp Martin’s book. In fact, he develops the basics of LP (e.g., Farkas’ lemma and duality) via projection (FME) and “inverse projection.”

A brief tutorial on Gomory mixed integer cuts (as applied to pure IPs)

In an earlier post, I gave a brief tutorial on Gomory fractional cuts. However, Gomory fractional cuts are not used in practice. A primary reason is that they are subsumed by the more general and stronger Gomory mixed integer (GMI) cuts which were introduced in a research memorandum in 1960 by Ralph Gomory.

Generality. GMI cuts apply when the problem has a mix of integer and continuous variables (MIPs), whereas Gomory fractional cuts only apply for problems in which all variables are integer (pure IPs).

Strength. When GMI cuts are applied to pure integer programs, they are just as strong or stronger than Gomory fractional cuts.

To keep things simple, I will stick with GMI decaf (i.e., as applied to pure IPs). The interested reader can consult the longer tutorial by Cornuéjols for the caffeinated version.

Note: I’m not a huge fan of how WordPress is formatting some of this post. You may find the pdf version easier to read.

The GMI cut for pure integer programs

Suppose that nonnegative integers x_1,\dots, x_n satisfy the equation:

\sum_{i=1}^n a_i x_i = b,

where b is fractional. Think of this equation as a row of the simplex tableau/dictionary.

It will help to have notations for the “fractional” parts of b and each a_i. So, let f := b-\lfloor b \rfloor and f_i := a_i - \lfloor a_i \rfloor. Each of these are nonnegative.

Letting I=\{1,\dots, n\}, the associated GMI cut is:


Notice that if f_i \le f for all of the i’s, then the resulting inequality is exactly the same as the Gomory fractional cut, which can be written as:

\sum_{i\in I} f_i x_i  \ge f.

If at least one i satisfies f_i>f, then the GMI cut is stronger.

Example (from here).

Consider the following IP.


Solving the LP relaxation gives the following system (with objective z).


This corresponds to the fractional point (x_1,x_2,x_3,x_4)=(1.3,3.3,0,0).

If we apply the GMI formula for row 2 of this system, we have f=0.3, f_1=0, f_2=0, f_3= 0.8, f_4 = 0.1, giving the inequality \frac{1-0.8}{1-0.3} x_3 + \frac{0.1}{0.3}x_4 \ge 1, or equivalently:

6 x_3 + 7x_4 \ge 21.

Compare this to the (weaker) Gomory fractional cut 0.8x_3 + 0.1x_4 \ge 0.3, or equivalently:

56x_3 + 7x_4 \ge 21.

The last row of the tableau happens to give the same GMI cut.

Unfortunately, I don’t have intuitive explanations for these GMI cuts like I had for the Gomory fractional cuts in the last post. So, let’s settle for a proof.

Proof that GMI cut is valid

(The formatting in the pdf looks nicer.)

We show that every x^*\in S:=\left\{x \in \mathbb{Z}^n_+ ~|~\sum_{i\in I} a_ix_i = b\right\} satisfies inequality (1).


Consider some x^* \in S. This implies that x^* \in \mathbb{Z}^n_+ and



Since \lfloor a_i \rfloor and \lfloor b \rfloor and x^*_i are integers, and by equation (2), we can write



for some integer k. In terms of our f notation, this is



In the first case, suppose that k \ge 0, in which case



The last equation holds by (3), and the last inequality holds by k\ge 0.

In the second case, suppose that k<0. Then, k \le -1 since k is an integer, in which case



The last equation holds by (3), and the last inequality holds by k \le -1.

So, x^* satisfies the GMI inequality (1) in both cases, so the GMI inequality is valid for S.





A brief tutorial on PORTA

What does PORTA do?

PORTA is a software package for analyzing polyhedra. One problem it can solve is:

Problem: Find H-representation of the convex hull of some given points
Input: a set S of points in n-dimensional space
Output: a minimal inequality system defining conv(S).

Minor note: If conv(S) is not full dimensional, PORTA will report some equations too.

For a longer tutorial on PORTA and Polymake, see the slides by Marc Pfetsch.

Example: Independent Sets

Consider the 5-vertex cycle graph C_5 on vertex set {1,2,3,4,5}. We can list all of its stable/independent sets:

ALL_STABLE_SETS = { {}, {1}, {2}, {3}, {4}, {5}, {1,3}, {1,4}, {2,4}, {2,5}, {3,5} }.

We can write these stable sets in terms of their characteristic vectors:

0 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 1 0 0
1 0 0 1 0
0 1 0 1 0
0 1 0 0 1
0 0 1 0 1

Here, each ROW represents a stable set. We can give PORTA this collection ALL_STABLE_SETS, and it will return the following inequality description of conv(ALL_STABLE_SETS):

( 1) -x1 <= 0
( 2) -x2 <= 0
( 3) -x3 <= 0
( 4) -x4 <= 0
( 5) -x5 <= 0
( 6) +x4+x5 <= 1
( 7) +x3+x4 <= 1
( 8) +x2+x3 <= 1
( 9) +x1 +x5 <= 1
( 10) +x1+x2 <= 1
( 11) +x1+x2+x3+x4+x5 <= 2

Inequalities (1)–(5) are the nonnegativity bounds; (6)–(10) are the edge inequalities; (11) is an odd-hole inequality.

How do I use PORTA?

  1. Download version 1.4.1
  2. Create the .poi file* which contains all the points in S. See the formatting in the example file StableSet_5cycle.poi
  3. Open command prompt and navigate to the folder porta-1.4.1\win32\bin\
  4. Run PORTA on the .poi file. For example, put the .poi file in the folder porta-1.4.1\win32\bin\ and use the command traf.bat StableSet_5cycle.poi
  5. PORTA will show its progress to the command prompt window and then create an output file. In our case, the output file is StableSet_5cycle.poi.ieq. The command prompt window will look like this:cmd.png

(*) The .poi file can be time consuming to create by hand, and there are a few ways to create them automatically. In my case, I usually write some simple code that enumerates all 0-1 vectors and writes the feasible vectors to a file. In my experience, PORTA can only handle instances with roughly 20 dimensions, so this enumeration procedure is not the bottleneck.

I dislike big M’s and I cannot lie

Big M’s in integer programming formulations have been a topic of discussion on Twitter and the blogosphere in the past week. For example, see Marc-Andre Carle‘s posts (primal and dual) and followup tweets and also some older posts from Paul Rubin and Bob Fourer.

My initial reaction was don’t let the software handle it. If possible, try to reformulate the problem.

In this post, I explain myself and give supporting computational experiments.

Example: The uncapacitated facility location problem

Consider the uncapacitated facility location (UFL) problem. The task is to open a subset of n available facilities to service a set of m customers. The goal is to minimize the sum of facility opening costs and customer-delivery costs.

The parameters for this problem include the cost f_i to open facility i and the cost c_{ij} associated with fulfilling customer j‘s demand via facility i. We allow for a facility to fulfill a fractional amount of a customer’s demand, in which case the demand-fulfillment costs are proportional.

1. The disaggregated formulation

One natural formulation is as follows, where x_{ij} is the proportion of customer j‘s demand that is fulfilled by facility i, and y_i is a binary variable representing the decision to open facility i.

The objective is to minimize the total cost:

\min \sum_{i=1}^n \sum_{j=1}^m c_{ij} x_{ij} + \sum_{i=1}^n f_i y_i.

Each customer’s demand should be fulfilled:

\sum_{i=1}^n x_{ij}=1,~~j =1,\dots, m.

If a facility is closed, then it can fulfill none of a customer’s demand:

x_{ij} \le y_i,~~i=1,\dots, n,~j=1,\dots, m.

Finally, the x variables should be nonnegative, and the y variables should be binary.

2. The big-M formulation

In the big-M formulation, everything is the same, except that the x_{ij} \le y_i constraints are replaced by the constraints:

\sum_{j=1}^m x_{ij} \le M y_i.

To keep the formulation as strong as possible, the value used for M should be as small as possible. In this case, we should set M=m, which is also what you get when you aggregate the x_{ij} \le y_i constraints. (Carle’s post from above explores what happens when you use unnecessarily large M’s in situations like this.)

Which formulation is “better”?

Depending on who you ask, either of these formulations could be better. Many textbooks on integer programming prefer the first, since its LP relaxation is much stronger, meaning that the number of branch-and-bound nodes that will be explored is fewer. Results for 5 different benchmark instances with m=n=100 show that these bounds can indeed be much stronger.

LP 1 objective LP 2 objective
123unifS                 67,441                 12,271
623unifS                 70,006                 12,674
1823unifS                 69,649                 12,066
2623unifS                 68,776                 11,958
3023unifS                 71,639                 12,170

On the other hand, the big-M formulation has only O(n+m) constraints (if we exclude nonnegativity bounds), whereas the first formulation has \Omega(nm) constraints. So, the time to solve the LP relaxation for the big-M formulation is probably less. Indeed, the recent textbook by Conforti, Cornuéjols, and Zambelli states:

When the aggregated formulation is given to state-of-the-art solvers, they are able to detect and generate disaggregated constraints on the fly, whenever these constraints are violated by the current feasible solution. So, in fact, it is preferable to use the aggregated formulation because the size of the linear relaxation is much smaller and faster to solve.

However, MIP solvers rely on sparse matrix representation, meaning that the number of nonzeros in the formulation is more indicative of its size than the quantity obtained by multiplying the number of variables by the number of constraints. Indeed, the number of nonzeros to impose the x_{ij} \le y_i constraints is only about twice the number of nonzeros to impose the big-M constraints.

So, I was not exactly convinced that the time to solve the LP relaxations would be much different and decided to run some experiments. Here are the times (in seconds) to solve the LP relaxations for the same 5 benchmark instances in Gurobi 7.0.2.

Solve time for LP 1 Solve time for LP 2
123unifS                   0.115                   0.022
623unifS                   0.125                   0.023
1823unifS                   0.107                   0.022
2623unifS                   0.128                   0.022
3023unifS                   0.150                   0.024

While the LP times are indeed longer for the disaggregated formulation, it is only by a factor of 5 or 6, even though the number of constraints has increased by a factor of 50!

However, what we are actually interested in is the time to solve IP. By this measure, the disaggregated formulation wins though both times are reasonable for these sized instances. Experiments for larger instances are in the appendix.

Solve time for IP 1 Solve time for IP 2
123unifS                   3.496                   8.151
623unifS                   1.607                   7.176
1823unifS                   1.528                   4.468
2623unifS                   1.263                   3.415
3023unifS                   4.932                 14.153


I have only provided test results for 5 UFL instances, so I cannot make any general conclusions about the (dis)advantages of big M’s from this post. However, I can say that one should at least look for alternative formulations that do not rely on big M’s and then empirically compare them.

I leave with a rhetorical question. Where would the field of integer programming be if everyone concluded that the Miller-Tucker-Zemlin formulation for TSP was good enough (even though it has big M’s)? What if they said “just let the solver take care of it”?


I constructed larger instances following the same procedures that were used when making the m=n=100 instances. Namely, each facility has an opening cost of 3000, and the assignment costs are uniformly distributed integers between 0 and 1000. (Actually, I used rand() % 1001 in C++, which is slightly biased to smaller numbers.)

The LP times are much like before.

Solve time for LP 1 Solve time for LP 2
n=100,m=150 0.437 0.057
n=100,m=200 0.506 0.076
n=150,m=100 0.422 0.070
n=200,m=100 0.641 0.083

So are the IP times.

Solve time for IP 1 Solve time for IP 2
n=100,m=150 178.049 225.599
n=100,m=200 420.491 501.478
n=150,m=100 51.808 153.065
n=200,m=100 110.433 143.192

Somewhat surprisingly, the IP time for formulation 2 decreased when increasing the number of facilities from 150 to 200. Probably the solver got lucky.

I have designated the x variables as continuous in the formulation, but there will always be an optimal solution in which they are binary. Given that there are more presolve procedures that apply to binary variables than continuous variables, it might help to declare them as binary. I tested the idea on a randomly constructed instance with m=n=100, but the continuous setting was better.

binary x continuous x
Solve time for IP 1 25.577 7.405
Solve time for IP 2 46.358 12.897

A brief tutorial on Gomory fractional cuts

Integer programs (IPs) are solved in practice by branch-and-cut algorithms. These algorithms rely heavily on the effective use of cutting planes, which are inequalities that are added to linear programming (LP) relaxations to cut off (bad) fractional points, but not the (good) integer feasible points. The classical cutting planes for solving arbitrary IPs were developed by Ralph Gomory in the late 1950s.

Prior to working on IP, Gomory earned his PhD in mathematics from Princeton in 1954, after which he spent a few years with the US Navy. In 1957, he went back to Princeton, but was retained as a consultant for the Navy. Gomory later recounted that his involvement with the Navy led him to IP:

As the Navy had kept me on as a consultant I continued to work on Navy problems through monthly trips to Washington. On one of these trips a group presented a linear programming model of a Navy Task Force. One of the presenters remarked that it would be nice to have whole number answers as 1.3 aircraft carriers, for example, was not directly usable.

Within the next few weeks, Gomory had developed his technique for generating cutting planes based on the simplex tableau. Soon after, he proved that his algorithm was finite and programmed it on the E101, allowing him to solve 5-variable problems reliably.

Example (from here).

Consider the following IP.


Solving the LP relaxation gives the following system (with objective z).


This corresponds to the fractional point (x_1,x_2,x_3,x_4)=(1.3,3.3,0,0).

A first cut

In any feasible solution to the IP, x_2 will take an integer value, so, from the second row, we can write 0.8x_3+0.1x_4=0.3+k for some integer k. Notice that the left side of this equation can only take nonnegative values, so the right side must as well, i.e.,  0.3 +k \ge 0 or k \ge  - 0.3. Since k is an integer, we know that k \ge 0. So, 0.8x_3 + 0.1x_4 = 0.3+k \ge 0.3. We have thus argued for the validity of the Gomory fractional cut:

0.8x_3 + 0.1x_4 \ge 0.3.

Note that this inequality cuts off our fractional point (1.3,3.3,0,0).

A second cut

Now consider the last row x_1 -0.2x_3 + 0.1x_4 =1.3. As before, we can write -0.2x_3+0.1x_4=0.3+k for some integer k. However, the left side of this equation may not always take nonnegative values (due to the negative coefficient for x_3), so our previous argument will not work. However, if we add x_3 to both sides, we can write 0.8x_3+0.1x_4=0.3+p for some (other) integer p. Using the same argument as before, we can generate the (same) Gomory fractional cut:

0.8x_3 + 0.1x_4 \ge 0.3.

Gomory fractional cuts in general

We can write the Gomory fractional cut in more general terms as follows. Suppose that nonnegative integers x_1,\dots, x_n satisfy the equation:

\sum_{i=1}^n a_i x_i = a_0,

where a_0 is fractional, i.e., a_0 \notin \mathbb{Z}. Think of this equation as a row of the simplex tableau/dictionary. The associated Gomory fractional cut is:

\sum_{i=1}^n (a_i -\lfloor a_i \rfloor) x_i \ge a_0 - \lfloor a_0\rfloor.

Proposition. The Gomory fractional cut is valid for the set X=\{ x \in \mathbb{Z}^n_+ ~|~\sum_{i=1}^n a_i x_i = a_0\}.

Proof. Let x^* \in X. We are to show that \sum_{i=1}^n (a_i -\lfloor a_i \rfloor) x^*_i \ge a_0 - \lfloor a_0\rfloor. By x^* \in X, we know that \sum_{i=1}^n a_i x^*_i = a_0. Since each \lfloor a_i\rfloor is integer, and since each x^*_i is integer, we can write:

\sum_{i=1}^n (a_i -\lfloor a_i \rfloor) x^*_i = a_0 - \lfloor a_0\rfloor +k,

for some integer k. Observe that the left side is nonnegative, so a_0 - \lfloor a_0\rfloor +k \ge 0 and thus k \ge \lfloor a_0\rfloor-a_0 > -1. By k \in \mathbb{Z}, this implies k\ge 0. In conclusion,

\sum_{i=1}^n (a_i -\lfloor a_i \rfloor) x^*_i = a_0 - \lfloor a_0\rfloor +k \ge a_0 - \lfloor a_0\rfloor. Q.E.D.


Two observations:

  1. Subtracting \lfloor a_i \rfloor from a_i in the Gomory fractional cut ensures that the left side is nonnegative, which was key to our arguments. We could have instead subtracted some q < \lfloor a_i \rfloor from a_i and the resulting inequality would remain valid; however, this would weaken the inequality.
  2. The Gomory fractional cut remains valid when you add other constraints to the set X. This means you can still use it when your IP is more complicated.