A brief tutorial on Benders decomposition

Here are my notes on Benders decomposition, as applied to the k-median problem (static pdf, overleaf source). They cover:

  • A MIP model for the k-median problem
  • Writing the subproblem (after the facilities have been sited)
  • Writing the dual of the subproblem
  • The Benders reformulation
  • Generating Benders optimality cuts from dual feasible solutions

The GitHub repository has supporting (slow) Python code. For more info, see the references at the end of the notes.


[2020 update] Can my state redistrict at the county level?

In a previous blog post, I asked a basic question:

Can my state draw its congressional districts so that:

  1. all districts have roughly the “same” population;
  2. each district is contiguous on the map; and
  3. no county is split between districts?

The blog post showed how mixed-integer programs (MIPs) can be used to answer this question. When using data from the 2010 Census (and subsequent apportionment), it turned out that 12 states (having more than one congressional seat) could conceivably draw county-level maps. They were:

Alabama, Arkansas, Idaho, Iowa, Kansas, Louisiana, Maine, Mississippi, Nebraska, New Mexico, Oklahoma, and West Virginia. 

Less than a month ago, the Census released the 2020 redistricting data. So, we can now ask the same question as before, but with 2020 data. Running the code, we find that 11 states could conceivably draw county-level maps. They are:

Alabama, Arkansas, Idaho, Iowa, Kansas, Maine, Mississippi, Montana, Nebraska, New Mexico, and West Virginia.

We see that Louisiana and Oklahoma are no longer county-level feasible. For Oklahoma, the reason is that Oklahoma County grew to 796,292 people, making it too large to fit into one district (which should be between 787,912 and 795,829). Also, Montana gained a second congressional seat, making it no longer a “trivial” instance of redistricting.

Below are the maps that the code found.


  1. I am in no way recommending that these maps be adopted. They are only shown here to certify that the stylized question asked above has a yes answer. Want to create a “good” map of your own? Try drawing one on https://districtr.org/ or on https://davesredistricting.org/.
  2. As of today, I don’t yet have a complete 2020 dataset. For the runs above, I took the 2010 county-level graphs and read 2020 populations into them. This causes some data mismatches because some counties have changed their names (e.g., Shannon County in South Dakota was renamed Oglala Lakota County) and some county-equivalents have disappeared (e.g., Bedford city re-joined Bedford County in Virginia). However, Virginia is nevertheless infeasible because Fairfax County’s population is too large for one district, and South Dakota has only one congressional seat anyway.
  3. Louisiana is infeasible at the county-level when using Daryl DeFord’s graph. However, Louisiana is feasible when using the graph on Dave’s Redistricting. The reason is that St. Tammany Parish and St. John the Baptist Parish are considered adjacent on Dave’s Redistricting but not by DeFord. Their polygons touch but their land is separated by water.

US Congressional Apportionment as an Optimization Problem (MINLP)

In April 2021, the US Census Bureau announced the latest apportionment results. Apportionment answers the question: How many seats will each state get in the House of Representatives? It also factors into the Electoral College. The latest numbers are given below.

Interestingly, New York, which saw its number drop from 27 to 26 seats, would have stayed at 27 if it had just 89 more people! Several other states, like California, Arizona, and Texas, did not receive as many seats as expected. How did the Census arrive at these numbers?

The Apportionment Problem

In apportionment, we are given the population p_i of each state i \in S and are tasked with distributing a certain number of seats k among them. That is, we should find an integer number of seats x_i for each state i \in S that sum to k. In the US, the current number of seats is k=435, and each state must receive at least one seat. Mathematically, the constraints of apportionment dictate that:

\sum_{i \in S} x_i = k \quad (Distribute a total of k seats.)

x_i \in \mathbb{Z} \quad \forall i \in S  \quad (Each state gets an integer number of seats.)

x_i \ge 1\quad \forall i \in S  \quad (Each state gets at least one seat.)

Denote by X as the set of apportionments x that satisfy these constraints.

The question then becomes: What should be the objective function? What makes one possible apportionment “better” than another?

A typical response is fairness, specifically proportionality; if a state has twice the population of another, then it should receive twice as many seats. This abides by the “one-person, one-vote” principle. In the ideal case, each state i receives a number of seats equal to its quota q_i, i.e., its proportion of the country’s population multiplied by the total number of seats:

q_i = \left (\frac{p_i}{\sum_{j \in S} p_j} \right ) k.

However, the quotas are almost certainly fractional. For example, New York’s 2020 population was 20,215,751 which is roughly 6% of the total US apportionment population (331,108,434) making its quota equal to q_{NY} \approx 26.56.

What should we do with the fractions? One idea is to round them up, i.e., set x_i = \lceil q_i \rceil. This will end up distributing more than the allotted 435 seats. Rounding down, x_i =\lfloor q_i \rfloor, gives a different problem: too few seats distributed. Rounding to the nearest integer will work sometimes (e.g., for the 2020 data), but not always. For example, consider a hypothetical instance with three equally populated states and k=5 seats to distribute; each state would have a quota of \approx 1.67, which, when rounded, would give 6 total seats.

Enter Thomas Jefferson and Alexander Hamilton

The US Constitution does not specify what method should be used for apportionment. So, when results from the first census arrived in 1791, there were political disputes about how to proceed.

Alexander Hamilton proposed the following method. Choose the number of seats k to be apportioned. Find the quotas and give each state its quota’s floor \lfloor q_i \rfloor. Assign any leftover seats to those states having the largest remainders q_i - \lfloor q_i \rfloor.

It turns out that Hamilton’s method solves the optimization problems (code here):

\min_{x \in X} \sum_{i \in S} |x_i - q_i|.

\min_{x \in X} \sum_{i \in S} (x_i - q_i)^2.

Thomas Jefferson proposed a different “divisor” method. Normally, the quota q_i is calculated by dividing the population p_i by the divisor \frac{\sum_{i \in S} p_j}{k}. Rounding these quotas down to an integer may distribute too few seats. In this case, Jefferson proposed to instead divide the population p_i by some other divisor. Specifically, pick a divisor d that will distribute the appropriate number of seats after rounding down, i.e., so that \sum_{i \in S} \lfloor \frac{p_i}{d} \rfloor = k, and then set x_i =  \lfloor \frac{p_i}{d} \rfloor for each state i \in S.

It turns out that Jefferson’s method solves the optimization problem (code):

\min_{x \in X} \max_{i \in S} \{  \frac{x_i}{p_i}\}.

Jefferson’s method was used through the 1830s, but fell out of favor as it became known to advantage large states over smaller states. For example, in 1820, New York received 34 seats (far exceeding its quota of 32.50), while Delaware received just one seat (below its quota of 1.68).

Enter John Quincy Adams and Daniel Webster

Recognizing Jefferson’s method’s bias towards large states, John Quincy Adams proposed essentially the same method, but chose to round up, i.e., pick a divisor d so that \sum_{i \in S} \lceil \frac{p_i}{d} \rceil = k and then set x_i =  \lceil \frac{p_i}{d} \rceil for each state i \in S. Predictably, this had the opposite effect; it favored small states over large states.

It turns out that Adams’ method solves the optimization problem (code):

\min_{x \in X} \max_{i \in S} \{  \frac{p_i}{x_i}\}.

Daniel Webster proposed a compromise between the methods of Adams and Jefferson, by rounding to the nearest integer. That is, pick a divisor d so that \sum_{i \in S} \lceil \frac{p_i}{d}-0.5 \rceil = k and then set x_i =  \lceil \frac{p_i}{d}  -0.5  \rceil for each state i \in S.

It turns out that Webster’s method solves the optimization problems (code):

\min_{x \in X} \sum_{i \in S} \frac{x^2_i}{p_i}.

\min_{x \in X} \sum_{i \in S} p_i \left ( \frac{x_i}{p_i} - \frac{k}{\sum_{j \in S} p_j}\right )^2.

Webster’s method was used in the 1840s. Subsequently, in the 1850s, 1860s, and 1870s, Hamilton’s method was adopted, but under a new name: “Vinton’s method” and its output was altered for political reasons.

Paradoxes and Axioms

With the 1880 census data came some strange results when using Hamilton’s method. If k=299 seats were to be distributed, then Alabama would get 8 of them. However, if the allotment increased to 300 seats, then Alabama’s representation would drop to 7 seats. Even though there is more pie to go around, someone gets less! This is the “Alabama paradox.”

There were other problems with Hamilton’s method. In the 1900s, Virginia grew faster than Maine in population, but Virginia would lose a seat to Maine. This is the “population paradox.”

Before Oklahoma became a state in 1907, Maine and New York had 3 and 38 seats respectively, out of 386 seats total. When Oklahoma joined and got 5 seats (increasing the total to 391), Maine and New York would swap a seat under Hamilton’s method, going to 4 and 37 seats respectively. This is the “new states paradox.”

These paradoxes led mathematicians to study apportionment methods in more detail and axiomatize their study. Some prominent theorems include:

  1. Divisor methods (e.g., Jefferson, Adams, Webster) are the only methods that avoid the population paradox.
  2. All divisor methods avoid the Alabama paradox.
  3. All divisor methods avoid the new states paradox.

Based on these results, it seems that a divisor method should be chosen. Intuitively, one should avoid Jefferson’s and Adam’s methods because they exhibit bias towards large and small states, respectively. How many other divisor methods are there? Which should be used?

Enter Huntington and Hill

Joseph A. Hill began doing statistical work for the US Census Bureau in 1899 and later became chief statistician in 1909. He (essentially) proposed an objective function for apportionment. It amounts to a local search condition, as shown by Edward V. Huntington, a professor at Harvard. Huntington called it the method of equal proportions which has been used in the US since the 1940s.

Consider two states i and j in a possible apportionment x. State i is favored relative to state j if it has more seats per person, i.e., if x_i / p_i > x_j / p_j. The amount of inequality between these states might then be measured by the quantity \frac{x_i/p_i}{x_j/p_j}-1. If moving a seat from state i to state j reduces the amount of inequality, then do so. Repeat until no such transfers reduce the pairwise amounts of inequality. (Note: this procedure does terminate.) This is one way to describe the Huntington-Hill method. There is also a construction procedure that starts with zero seats assigned and greedily adds seats to states according to a particular rule. Python code is available here.

It turns out that the Huntington-Hill method of equal proportions solves the optimization problem (code):

\min_{x \in X} \sum_{i \in S} x_i \left ( \frac{p_i}{x_i} - \frac{\sum_{j \in S} p_j}{k}\right )^2.

Interestingly, Huntington’s measure of inequality \frac{x_i/p_i}{x_j/p_j}-1 can be changed to create local search versions of the methods of Adams, Dean, Webster, and Jefferson (code).

  1. Adams: x_i - x_j (p_i/p_j)
  2. Dean: p_j/x_j - p_i /x_i
  3. Webster: x_i/p_i - x_j/p_j
  4. Jefferson: x_i(p_j/p_i) - x_j.

Solving the MINLPs on 2020 Census Data

The codes solve the associated MINLPs with the Gurobi solver. Each MINLP is applied to Example 1.1 from the appendix of Balinski and Young‘s 1982 book Fair Representation.

I also tried applying them to the 2020 Census data, but Gurobi gave very strange results. For example, Gurobi would sometimes suggest to give one seat to each state (except California) and give California the remaining seats. This is clearly not optimal for the given objective function.

My suspicion is that numerical instability is to blame. For example, some of the constraints of Jefferson’s model (after linearization) have the form x_i \le p_i y, with the integer variable x_i likely taking a small value (up to 52), p_i being a large number (up to 40 million), and y being a continuous variable. Big-M constraints like these are known to cause numerical problems, so this should not be a huge surprise. I tried tightening Gurobi’s tolerances, but to no avail. So, while these apportionment instances cannot reliably be handled with commercial MIP solvers, they might serve as interesting test cases for MIP solvers that use exact rational arithmetic since the answers can be double-checked with simple algorithms (code).

Further Reading

This post is based primarily on Balinski and Young’s 1982 book Fair Representation and their 1983 Interfaces paper “Apportioning the United States House of Representatives.”

Their opinion, which they back up with theorems and empirical evidence, is that Webster’s method is preferable to the Huntington-Hill method, because:

  • Webster’s method is the only “unbiased” divisor method (not favoring large or small states).
  • Another desirable property of an apportionment method is to stay within the quota, i.e., to give each state a number of seats equal to its quota’s floor or ceiling, i.e., to have \lfloor q_i \rfloor \le x_i \le \lceil q_i \rceil. Unfortunately, there is no method that simultaneously avoids the population paradox and stays within quota. However, Webster’s method is the least likely (among divisor methods) to violate quota. It is also the only divisor method that stays “near” the quota.

In their words, “while it is not possible to to satisfy all of the principles all of the time, it is possible to satisfy all of them almost all of the time.”

Balinski was quite active in the optimization and OR communities before his passing in 2019. Here are some select tidbits from his INFORMS biography:

…Balinski went on to Princeton University and studied under Albert W. Tucker and Ralph E. Gomory, earning a PhD in mathematics in 1959.

…His 1965 Management Science article, “Integer Programming: Methods, Uses, Computation” was awarded INFORMS’s Frederick W. Lanchester Prize.

…In 2013, the Institute for Operations Research and the Management Sciences (INFORMS) awarded him the John von Neumann Theory Prize, one of the highest honors an operations researcher can receive… The following year he was elected an INFORMS Fellow.

…He was one of the six founding members and was the founder and first editor-in-chief of its journal, Mathematical Programming (1970-1980). He went on to serve as MPS president for three years in the late 1980s.

Young also frequently publishes in INFORMS journals and wrote a nice paper showing the limitations of various compactness measures in redistricting.

How close were recent US presidential elections?

Many websites try to rank the “closest” presidential elections in the USA:

Ranking presidential elections by popular vote margin can mislead. For example, Hillary Clinton won the 2016 popular vote by nearly 3 million votes but lost the Electoral College.

Ranking presidential elections by Electoral College margin can also mislead. For example, a candidate could win the Electoral College in a landslide by getting 50.01% of the vote in every state, but intuitively this would be an extremely close election.

A better approach is to ask: How many extra votes would the runner-up have needed in just the right states in order to have won the election?

Answering this question may not be easy to do by hand. However, with the tools of operations research and integer programming, it can be done in 0.01 seconds.

The Election Closeness Problem

To describe the problem, let’s consider some data for the 2020 presidential election. For each state, we have:

  • EV : # of Electoral College votes
  • D_votes/R_votes : # of votes cast for the Democratic/Republican candidate
  • Winner/Runner-up : winner and runner-up of that state’s EVs
  • Cushion (for given state) : # of extra votes the runner-up would have needed to win the state
StateEVD_votesR_votesWinnerRunner-upCushion (for given state)

For example, for the Republican candidate to have won Arizona (AZ), he would have needed 10,458 more votes than he actually received, where 10,458 = 1 + (1,672,143 – 1,661,686). If this had happened, he would have received 11 additional EVs. This, however, would not have been enough to reach the 270 EV threshold needed to win the presidency.

Generally, we have the following data:

  • S : the set of states that the runner-up lost.
  • c_i : # of extra votes the runner-up would have needed to win state i.
  • a_i : # of EVs for state i.
  • b : # of EVs that the runner-up actually won.

For example, in 2020 we have:

  • S=\{AZ, CA, \dots \}
  • c[AZ]=10,458
  • a[AZ]=11
  • b=232

Election Closeness via Integer Programming

To model the election closeness problem as an integer program, we can use a binary variable x_i for each state indicating whether or not it is “flipped”. We can then write the following integer program, essentially a 0-1 knapsack problem.

The objective is to minimize the number of extra votes needed to win the election, which we call the winner’s cushion:

\min \sum_{ i \in S } c_i x_i.

The only constraint (besides the binary restrictions on x) is that the runner-up gain enough EVs to reach the 270 threshold:

\sum_{i \in S} a_i x_i \ge 270-b.

Even though 0-1 knapsack problems are NP-hard, every integer programming solver out there can solve election closeness instances in less than one second. We will use Gurobi. Our GitHub repository has the data, Python code, and full results.

The Results

The table below gives results for presidential elections from 2000 to 2020.

YearWinner’s cushionStates
2004115,573CO, IA, NM
2008990,629FL, IN, IA, NH, NC, OH, VA
2012429,526FL, NH, OH, VA
201677,747MI, PA, WI
202076,518AZ, GA, NV, WI

The results pass the smell test, identifying key swing states, like Florida in 2000 and Georgia in 2020. They suggest the following ranking of election closeness. For comparison purposes, the last two columns report EV margin and PV margin. (Note: a negative PV margin means the election winner received fewer popular votes.)

RankYearWinnerRunner-upWinner’s cushionEV marginPV margin

For the most part, the Winner’s cushion ranking matches the EV margin ranking. One exception is that 2004 election was the 2nd closest recent election according to EV margin, but 4th with respect to cushion.

On the other hand, the popular vote margin ranking is quite different than the cushion ranking. For example, consider the 2020 election in which Biden received 7 million more votes than Trump, making it the 2nd least close election (since 2000) according PV margin. However, the winner is decided by the Electoral College, and Trump would have won if he had received 76,518 more votes in select states, making it the 2nd closest election according to cushion.

Caveats, Further Reading, Acknowledgements

In the analysis, we supposed that the first place candidate in a state gets all of its electors. This is not actually true in Maine and Nebraska where the process is a bit more complicated. There is also the issue of faithless electors.

The election closeness problem considered here is very similar to some previous works. For example, see Alexander S. Belenky’s paper, paper, and book. Around 2008, Mike Sheppard also considered essentially the same problem, but asked how many voters would need to change their votes (which is twice as effective as increasing voter turnout). Unfortunately, Sheppard’s website, which covered elections from 1789 to 2008, is no longer available (archive here). The work did got some press (ABC, WSJ, WSJ), including a blog post from OR’s own Michael Trick!

My PhD student Mohammad Javad Naderi collected the data for this post from Wikipedia and wrote the initial Python codes. Thanks to Paul Rubin for suggesting the word cushion.

Can my state redistrict at the county level?

Gerrymandering is a problem in the USA, and has led to notable Supreme Court cases in recent years. Examples include:

To deal with these issues, reformers have sought to change who controls the redistricting process, or what rules redistricting plans should follow. Constraints on traditional redistricting principles (e.g., population balance, contiguity, compactness, preservation of counties or other political subdivisions) are thought to prevent the most egregious of gerrymanders.

But, at what point do these constraints make redistricting impossible?

Here, we consider one of the most basic questions in redistricting: Can my state draw its congressional districts so that:

  1. all districts have roughly the “same” population;
  2. each district is contiguous on the map; and
  3. no county is split between districts?

We will use mixed integer programming (MIP) to answer this question. MIPs are a type of mathematical optimization problem.

The Input Data

To define the problem, we need some input data:

  • k : the number of districts to draw
  • L : the fewest number of people allowable in each district
  • U : the most number of people allowable in each district
  • p(i) : the number of people in county i
  • V : the set of counties
  • E : the pairs of counties that are adjacent
  • G=(V,E) : the contiguity graph (or “network”)

For example, Oklahoma has 77 counties that should be partitioned into k=5 congressional districts. After the 2010 census, Oklahoma’s population was 3,751,351, so each district should have roughly 750,270.2 people in it. We will allow a little flexibility and require each district to contain between L=746,519 and U=754,021. This ensures that the most and least populated districts will differ by at most 1%. The total number of county adjacencies is 195. An example is Oklahoma County and Cleveland County.

The data for all 50 states can be found here.

An Initial MIP

We start with a classical optimization model that goes back to the 1960s. It will enforce the most basic constraints in redistricting. It uses decision variables of the form:

x(i,j) = should county i be assigned to the district centered at county j?

If so, then x(i,j) should equal one. Otherwise, it will be zero. The MIP is given by:

Constraints (1b) ensure that each county i is assigned to one district. Constraint (1c) ensures that there will be k districts. Constraints (1d) ensure that each district has population between L and U. Constraints (1e) ensure that counties can only be assigned to a district that exists. Constraints (1f) ensure either that county i is assigned to county j or that it is not. The objective function (1a) that is being minimized seeks “compact” districting plans. Ultimately, this will be unimportant for us, as we’re only interested in determining whether there is a feasible districting plan, not whether it is “compact”.

For computational niceties, we require each district to be “centered” at its most populous county. This can be achieved by fixing x(i,j)=0 when county i is more populous than county j. This changes the objective value of model (1), but not its feasibility status.

Initial Results

We see that most states cannot redistrict at the county level. A primary reason is large cities. For example, Dallas County in Texas had a population of 2,368,139 in 2010, which far exceeds the bound U that we imposed. We call such instances “overtly infeasible”.

There are also seven states that are “trivial” in the sense that only one congressional district should be created and it must cover the whole state. There is nothing to solve.

There are 16 states left. When running the code, we see some problems. For example, here is a solution for Oklahoma:

Obviously, this districting plan lacks contiguity. This should not be a surprise; we never imposed contiguity constraints in our MIP!

Adding Contiguity Constraints

There are many ways to add contiguity constraints to our MIP. In the example python code, we use the simplest approach, which is based on flow techniques. Intuitively, the idea is to create flow at each district’s center and send it along the district’s edges to fuel the district’s other nodes. For the details, see model (2) here.

With these constraints, our issue for Oklahoma has been fixed.

Our computational results tell us that there is no solution for Colorado, New Hampshire, Oregon, or South Carolina.

By looking at the Denver and Portland metros below, can you see why?

If you’re looking for a harder puzzle, see if you can figure out why South Carolina cannot redistrict at the county level.

The other twelve states *do* admit contiguous county-level solutions: Alabama, Arkansas, Iowa, Idaho, Kansas, Louisiana, Maine, Mississippi, Nebraska, New Mexico, Oklahoma, and West Virginia. Try to draw some of them for yourself at districtr.org.


Only two states—Iowa and West Virginia—drew county-level maps after the 2010 census (if we exclude states with one congressional district). Based on the analysis conducted here, at most 10 other states could follow.

I emphasize “at most”, because our analysis does not consider all federal and state laws. For example, Section 2 of the Voting Rights Act (as interpreted by the Supreme Court in Thornburg v. Gingles) requires the creation of minority opportunity (or “majority-minority”) districts in certain cases. Preliminary analysis suggests that, when this constraint is considered, Alabama cannot redistrict at the county level.

Another caveat is that not all states have “counties” (e.g., Louisiana has parishes) or the counties have no real significance (e.g., Massachusetts abolished eight of its county governments by 2000).

Further Reading

My coauthors Hamidreza Validi, Eugene Lykhovyd, and I wrote a paper about handling contiguity constraints in redistricting problems when the units are census tracts instead of counties. These instances are much larger than the county-level instances considered here and require alternative MIPs and advanced techniques, like branch-and-cut algorithms and Lagrangian-based reduced-cost fixing, to solve in a reasonable amount of time. Here are links to that paper and code.

Why is maximum clique often easy in practice?

Recently, my coauthor Jose L. Walteros and I published a paper with the title “Why is maximum clique often easy in practice?” (forthcoming at Operations Research). Here is the abstract:

To this day, the maximum clique problem remains a computationally challenging problem. Indeed, despite researchers’ best efforts, there exist unsolved benchmark instances with one thousand vertices. However, relatively simple algorithms solve real-life instances with millions of vertices in a few seconds. Why is this the case? Why is the problem apparently so easy in many naturally occurring networks? In this paper, we provide an explanation. First, we observe that the graph’s clique number \omega is very near to the graph’s degeneracy d in most real-life instances. This observation motivates a main contribution of this paper, which is an algorithm for the maximum clique problem that runs in time polynomial in the size of the graph, but exponential in the gap g:=(d+1)-\omega between the clique number \omega and its degeneracy-based upper bound d+1. When this gap g can be treated as a constant, as is often the case for real-life graphs, the proposed algorithm runs in time O(dm)=O(m^{1.5}). This provides a rigorous explanation for the apparent easiness of these instances despite the intractability of the problem in the worst case. Further, our implementation of the proposed algorithm is actually practical—competitive with the best approaches from the literature.

The code is available here. A preprint of the paper is here. A slide deck is here.

Last Friday, I presented the work to a group at OSU. Here is the handout I provided to students.

A brief tutorial on Lagrangian techniques for k-median

Here are my notes on Lagrangian techniques for the k-median problem from a talk I gave at OSU today. They cover:

  • A Lagrangian relaxation model for k-median
  • How to solve the inner problem (reduce it to sorting)
  • Pseudocode for Lagrangian-based branch-and-bound
  • Experimental results

The GitHub repository has the supporting code. For more info, see the references at the end of the notes.

I’ve become picky about graph notation

A couple of years ago, I came across the following passage about graph notation.

…Martin Grötschel will always advocate a thorough investigation with a careful look at details, which can or cannot make a big difference. In particular, when it comes to graphs, digraphs, and hypergraphs, he becomes furious about “sloppy notation” that doesn’t distinguish between uv, (u,v), and {u,v}, because results do not automatically carry over between these cases. Martin Grötschel is correct, and the ability to seamlessly shift his attention from little details to the grand picture is without doubt one of his greatest strengths.


In many research papers, I’ve seen authors write about a simple undirected graph G=(V,E) with vertex set V and edge set E\subset V \times V and proceed to talk about its edges (u,v). Sometimes, papers will even say that V \times V is the edge set of a complete (simple) graph. I know, because I was guilty of some of these mistakes in my first few papers.

Here are a few of the problems.

simple undirected graph

What’s the problem here? Simple graphs are, by definition, undirected. There is no need to specify “undirected.”

edge set E\subseteq V \times V

What about here? Undirected graphs have undirected edges \{u,v\} which are unordered pairs of vertices (so \{u,v\} and \{v,u\} refer to the same edge), while the Cartesian product of sets A and B is ordered:

A \times B:=\{ (a,b) ~|~a \in A,~b \in B\}.

So, when G=(V,E) is a directed graph, it makes sense to say that E \subset V \times V because it may have edge (u,v) but not edge (v,u). But where does the edge set E live when G is undirected? The following approach is unsightly and not conducive to frequent use:

E \subseteq \{\{u,v\} ~|~u \in V,~v \in V,~u\neq v\}.

Instead, I have started writing that E \subseteq \binom{V}{2}, where

\binom{V}{2}:=\{\{u,v\} ~|~u\in V,~v \in V,~u\neq v\}.

The idea easily generalizes to \binom{V}{k} for hypergraphs with k vertices per hyperedge.

V \times V is the edge set of a complete (simple) graph

By now, the problems with this statement should be clear. One nice thing about the notation \binom{V}{2} is that when the graph is simple and complete we can just say that E=\binom{V}{2}.

A brief tutorial on Fourier-Motzkin Elimination

Let’s start with an analogy.

Equality systems : Gaussian elimination :: Inequality systems : ???

According to Kipp Martin, the answer is Fourier-Motzkin Elimination (FME) which allows one to project out variables from a system of linear inequalities. These slides from Thomas Rothvoss nicely illustrate what it means to project out (or eliminate) the variable y.

Note: I don’t particularly like how Gaussian elimination is described (via row operations on a tableau, much like how the simplex method is often described). The intuition is lacking. Instead, everything I describe in this post will work on the (in)equalities themselves.

Eliminating variables in equality systems

Start with the linear equality system


Proceed by isolating the variable z in each equation that contains z to get


We can eliminate variable z by setting all “definitions” of z equal to each other (and leaving other equations as is)


Simplifying gives


By isolating and then eliminating variable y, we get x=2. Back-substitution gives


So the only feasible solution is (x,y,z)=(2,2,-2). Observe that each time a variable is eliminated, one equality is also removed.

Eliminating variables in inequality systems

Suppose we started with the inequality system

2x+y+z\le 4
x+2y+z\ge 4
x+y\le 4.

Isolating z (where possible) gives

z\le 4-2x-y
z\ge 4-x-2y
x+y\le 4.

Matching each of the lower bounds on z with each of its upper bounds (and leaving other inequalities as is), we obtain

4-x-2y\le 4-2x-y
x+y\le 4.

Simplifying gives

x \le y
x+y\le 4.

Isolating y gives

y \ge x
y\le 4-x.

Applying the same procedure to eliminate y, we are left with

x \le 2.

This is the projection of the original feasible set onto the space of the x variable. That is,

  1. for any x\le 2 there exist values for y and z for which the original system is satisfied, and
  2. any point that is feasible for the original system satisfies x\le 2.

Consequences of Fourier-Motzkin Elimination (FME)

FME can test whether a linear program is feasible.

If the system is feasible, FME can provide a feasible solution by employing back-substitution (in accordance with the lower and upper bounds on the variable that is being eliminated). On the other hand, if the original system is infeasible, the final inequality system will have conflicting lower and upper bounds on the final variable such as x \le 0 and x \ge 11 or obvious contradictions like 7 \le 2.

FME can optimize linear programs.

To optimize an LP in which we are to maximize c^T x, simply add a variable z and constraint z\le c^T x to our system and make sure to eliminate z last. The optimality status (e.g., infeasible, unbounded, an optimal solution) can be obtained from the final inequality system. (How?) Note, however, that Fourier-Motzkin elimination is awfully slow (doubly exponential) and is not a practical means to solve LPs. (But, there is also a singly exponential elimination method.)

Projections of polyhedra are also polyhedra

If we eliminate a variable from a system with m inequalities, then we end up with a system of at most m^2/4 inequalities. So, if we start with a finite number of inequalities and variables, then any projection will have finitely many inequalities. (FME may give redundant inequalities, which is okay.)

Projections of rational polyhedra are also rational

If every coefficient and right-hand-side in our original inequality system is rational, then they will remain rational after a variable is eliminated.

Other results

More consequences follow by FME, see Kipp Martin’s book. In fact, he develops the basics of LP (e.g., Farkas’ lemma and duality) via projection (FME) and “inverse projection.”