Can my state draw its congressional districts so that:

all districts have roughly the “same” population;

each district is contiguous on the map; and

no county is split between districts?

The blog post showed how mixed-integer programs (MIPs) can be used to answer this question. When using data from the 2010 Census (and subsequent apportionment), it turned out that 12 states (having more than one congressional seat) could conceivably draw county-level maps. They were:

Alabama, Arkansas, Idaho, Iowa, Kansas, Louisiana, Maine, Mississippi, Nebraska, New Mexico, Oklahoma, and West Virginia.

Less than a month ago, the Census released the 2020 redistricting data. So, we can now ask the same question as before, but with 2020 data. Running the code, we find that 11 states could conceivably draw county-level maps. They are:

Alabama, Arkansas, Idaho, Iowa, Kansas,Maine, Mississippi, Montana, Nebraska, New Mexico, and West Virginia.

We see that Louisiana and Oklahoma are no longer county-level feasible. For Oklahoma, the reason is that Oklahoma County grew to 796,292 people, making it too large to fit into one district (which should be between 787,912 and 795,829). Also, Montana gained a second congressional seat, making it no longer a “trivial” instance of redistricting.

Below are the maps that the code found.

Disclaimers

I am in no way recommending that these maps be adopted. They are only shown here to certify that the stylized question asked above has a yes answer. Want to create a “good” map of your own? Try drawing one on https://districtr.org/ or on https://davesredistricting.org/.

As of today, I don’t yet have a complete 2020 dataset. For the runs above, I took the 2010 county-level graphs and read 2020 populations into them. This causes some data mismatches because some counties have changed their names (e.g., Shannon County in South Dakota was renamed Oglala Lakota County) and some county-equivalents have disappeared (e.g., Bedford city re-joined Bedford County in Virginia). However, Virginia is nevertheless infeasible because Fairfax County’s population is too large for one district, and South Dakota has only one congressional seat anyway.

Louisiana is infeasible at the county-level when using Daryl DeFord’s graph. However, Louisiana is feasible when using the graph on Dave’s Redistricting. The reason is that St. Tammany Parish and St. John the Baptist Parish are considered adjacent on Dave’s Redistricting but not by DeFord. Their polygons touch but their land is separated by water.

In April 2021, the US Census Bureau announced the latest apportionment results. Apportionment answers the question: How many seats will each state get in the House of Representatives? It also factors into the Electoral College. The latest numbers are given below.

Interestingly, New York, which saw its number drop from 27 to 26 seats, would have stayed at 27 if it had just 89 more people! Several other states, like California, Arizona, and Texas, did not receive as many seats as expected. How did the Census arrive at these numbers?

The Apportionment Problem

In apportionment, we are given the population of each state and are tasked with distributing a certain number of seats among them. That is, we should find an integer number of seats for each state that sum to . In the US, the current number of seats is , and each state must receive at least one seat. Mathematically, the constraints of apportionment dictate that:

(Distribute a total of seats.)

(Each state gets an integer number of seats.)

(Each state gets at least one seat.)

Denote by as the set of apportionments that satisfy these constraints.

The question then becomes: What should be the objective function? What makes one possible apportionment “better” than another?

A typical response is fairness, specifically proportionality; if a state has twice the population of another, then it should receive twice as many seats. This abides by the “one-person, one-vote” principle. In the ideal case, each state receives a number of seats equal to its quota , i.e., its proportion of the country’s population multiplied by the total number of seats:

.

However, the quotas are almost certainly fractional. For example, New York’s 2020 population was 20,215,751 which is roughly 6% of the total US apportionment population (331,108,434) making its quota equal to .

What should we do with the fractions? One idea is to round them up, i.e., set . This will end up distributing more than the allotted 435 seats. Rounding down, , gives a different problem: too few seats distributed. Rounding to the nearest integer will work sometimes (e.g., for the 2020 data), but not always. For example, consider a hypothetical instance with three equally populated states and seats to distribute; each state would have a quota of , which, when rounded, would give total seats.

Enter Thomas Jefferson and Alexander Hamilton

The US Constitution does not specify what method should be used for apportionment. So, when results from the first census arrived in 1791, there were political disputes about how to proceed.

Alexander Hamilton proposed the following method. Choose the number of seats to be apportioned. Find the quotas and give each state its quota’s floor . Assign any leftover seats to those states having the largest remainders .

It turns out that Hamilton’s method solves the optimization problems (code here):

.

.

Thomas Jefferson proposed a different “divisor” method. Normally, the quota is calculated by dividing the population by the divisor . Rounding these quotas down to an integer may distribute too few seats. In this case, Jefferson proposed to instead divide the population by some other divisor. Specifically, pick a divisor that will distribute the appropriate number of seats after rounding down, i.e., so that , and then set for each state

It turns out that Jefferson’s method solves the optimization problem (code):

.

Jefferson’s method was used through the 1830s, but fell out of favor as it became known to advantage large states over smaller states. For example, in 1820, New York received 34 seats (far exceeding its quota of 32.50), while Delaware received just one seat (below its quota of 1.68).

Enter John Quincy Adams and Daniel Webster

Recognizing Jefferson’s method’s bias towards large states, John Quincy Adams proposed essentially the same method, but chose to round up, i.e., pick a divisor so that and then set for each state . Predictably, this had the opposite effect; it favored small states over large states.

It turns out that Adams’ method solves the optimization problem (code):

.

Daniel Webster proposed a compromise between the methods of Adams and Jefferson, by rounding to the nearest integer. That is, pick a divisor so that and then set for each state .

It turns out that Webster’s method solves the optimization problems (code):

.

.

Webster’s method was used in the 1840s. Subsequently, in the 1850s, 1860s, and 1870s, Hamilton’s method was adopted, but under a new name: “Vinton’s method” and its output was altered for political reasons.

Paradoxes and Axioms

With the 1880 census data came some strange results when using Hamilton’s method. If seats were to be distributed, then Alabama would get 8 of them. However, if the allotment increased to 300 seats, then Alabama’s representation would drop to 7 seats. Even though there is more pie to go around, someone gets less! This is the “Alabama paradox.”

There were other problems with Hamilton’s method. In the 1900s, Virginia grew faster than Maine in population, but Virginia would lose a seat to Maine. This is the “population paradox.”

Before Oklahoma became a state in 1907, Maine and New York had 3 and 38 seats respectively, out of 386 seats total. When Oklahoma joined and got 5 seats (increasing the total to 391), Maine and New York would swap a seat under Hamilton’s method, going to 4 and 37 seats respectively. This is the “new states paradox.”

These paradoxes led mathematicians to study apportionment methods in more detail and axiomatize their study. Some prominent theorems include:

Divisor methods (e.g., Jefferson, Adams, Webster) are the only methods that avoid the population paradox.

All divisor methods avoid the Alabama paradox.

All divisor methods avoid the new states paradox.

Based on these results, it seems that a divisor method should be chosen. Intuitively, one should avoid Jefferson’s and Adam’s methods because they exhibit bias towards large and small states, respectively. How many other divisor methods are there? Which should be used?

Enter Huntington and Hill

Joseph A. Hill began doing statistical work for the US Census Bureau in 1899 and later became chief statistician in 1909. He (essentially) proposed an objective function for apportionment. It amounts to a local search condition, as shown by Edward V. Huntington, a professor at Harvard. Huntington called it the method of equal proportions which has been used in the US since the 1940s.

Consider two states and in a possible apportionment . State is favored relative to state if it has more seats per person, i.e., if . The amount of inequality between these states might then be measured by the quantity . If moving a seat from state to state reduces the amount of inequality, then do so. Repeat until no such transfers reduce the pairwise amounts of inequality. (Note: this procedure does terminate.) This is one way to describe the Huntington-Hill method. There is also a construction procedure that starts with zero seats assigned and greedily adds seats to states according to a particular rule. Python code is available here.

It turns out that the Huntington-Hill method of equal proportions solves the optimization problem (code):

.

Interestingly, Huntington’s measure of inequality can be changed to create local search versions of the methods of Adams, Dean, Webster, and Jefferson (code).

I also tried applying them to the 2020 Census data, but Gurobi gave very strange results. For example, Gurobi would sometimes suggest to give one seat to each state (except California) and give California the remaining seats. This is clearly not optimal for the given objective function.

My suspicion is that numerical instability is to blame. For example, some of the constraints of Jefferson’s model (after linearization) have the form , with the integer variable likely taking a small value (up to 52), being a large number (up to 40 million), and being a continuous variable. Big-M constraints like these are known to cause numerical problems, so this should not be a huge surprise. I tried tightening Gurobi’s tolerances, but to no avail. So, while these apportionment instances cannot reliably be handled with commercial MIP solvers, they might serve as interesting test cases for MIP solvers that use exact rational arithmetic since the answers can be double-checked with simple algorithms (code).

Their opinion, which they back up with theorems and empirical evidence, is that Webster’s method is preferable to the Huntington-Hill method, because:

Webster’s method is the only “unbiased” divisor method (not favoring large or small states).

Another desirable property of an apportionment method is to stay within the quota, i.e., to give each state a number of seats equal to its quota’s floor or ceiling, i.e., to have . Unfortunately, there is no method that simultaneously avoids the population paradox and stays within quota. However, Webster’s method is the least likely (among divisor methods) to violate quota. It is also the only divisor method that stays “near” the quota.

In their words, “while it is not possible to to satisfy all of the principles all of the time, it is possible to satisfy all of them almost all of the time.”

Balinski was quite active in the optimization and OR communities before his passing in 2019. Here are some select tidbits from his INFORMS biography:

…Balinski went on to Princeton University and studied under Albert W. Tucker and Ralph E. Gomory, earning a PhD in mathematics in 1959.

…His 1965 Management Science article, “Integer Programming: Methods, Uses, Computation” was awarded INFORMS’s Frederick W. Lanchester Prize.

…In 2013, the Institute for Operations Research and the Management Sciences (INFORMS) awarded him the John von Neumann Theory Prize, one of the highest honors an operations researcher can receive… The following year he was elected an INFORMS Fellow.

…He was one of the six founding members and was the founder and first editor-in-chief of its journal, Mathematical Programming (1970-1980). He went on to serve as MPS president for three years in the late 1980s.

Ranking presidential elections by popular vote margin can mislead. For example, Hillary Clinton won the 2016 popular vote by nearly 3 million votes but lost the Electoral College.

Ranking presidential elections by Electoral College margin can also mislead. For example, a candidate could win the Electoral College in a landslide by getting 50.01% of the vote in every state, but intuitively this would be an extremely close election.

A better approach is to ask: How many extra votes would the runner-up have needed in just the right states in order to have won the election?

Answering this question may not be easy to do by hand. However, with the tools of operations research and integer programming, it can be done in 0.01 seconds.

D_votes/R_votes : # of votes cast for the Democratic/Republican candidate

Winner/Runner-up : winner and runner-up of that state’s EVs

Cushion (for given state) : # of extra votes the runner-up would have needed to win the state

State

EV

D_votes

R_votes

Winner

Runner-up

Cushion (for given state)

AL

9

849,624

1,441,170

REP

DEM

591,547

AK

3

153,778

189,951

REP

DEM

36,174

AZ

11

1,672,143

1,661,686

DEM

REP

10,458

AR

6

423,932

760,647

REP

DEM

336,716

CA

55

11,110,250

6,006,429

DEM

REP

5,103,822

For example, for the Republican candidate to have won Arizona (AZ), he would have needed 10,458 more votes than he actually received, where 10,458 = 1 + (1,672,143 – 1,661,686). If this had happened, he would have received 11 additional EVs. This, however, would not have been enough to reach the 270 EV threshold needed to win the presidency.

Generally, we have the following data:

: the set of states that the runner-up lost.

: # of extra votes the runner-up would have needed to win state .

: # of EVs for state .

: # of EVs that the runner-up actually won.

For example, in 2020 we have:

Election Closeness via Integer Programming

To model the election closeness problem as an integer program, we can use a binary variable for each state indicating whether or not it is “flipped”. We can then write the following integer program, essentially a 0-1 knapsack problem.

The objective is to minimize the number of extra votes needed to win the election, which we call the winner’scushion:

.

The only constraint (besides the binary restrictions on ) is that the runner-up gain enough EVs to reach the 270 threshold:

The table below gives results for presidential elections from 2000 to 2020.

Year

Winner’s cushion

States

2000

538

FL

2004

115,573

CO, IA, NM

2008

990,629

FL, IN, IA, NH, NC, OH, VA

2012

429,526

FL, NH, OH, VA

2016

77,747

MI, PA, WI

2020

76,518

AZ, GA, NV, WI

The results pass the smell test, identifying key swing states, like Florida in 2000 and Georgia in 2020. They suggest the following ranking of election closeness. For comparison purposes, the last two columns report EV margin and PV margin. (Note: a negative PV margin means the election winner received fewer popular votes.)

Rank

Year

Winner

Runner-up

Winner’s cushion

EV margin

PV margin

1

2000

Bush

Gore

538

5=271-266

-543,816

2

2020

Biden

Trump

76,518

74=306-232

7,058,909

3

2016

Trump

Clinton

77,747

77=304-227

-2,868,686

4

2004

Bush

Kerry

115,573

35=286-251

3,012,171

5

2012

Obama

Romney

429,526

126=332-206

4,982,291

6

2008

Obama

McCain

990,629

192=365-173

9,550,193

For the most part, the Winner’s cushion ranking matches the EV margin ranking. One exception is that 2004 election was the 2nd closest recent election according to EV margin, but 4th with respect to cushion.

On the other hand, the popular vote margin ranking is quite different than the cushion ranking. For example, consider the 2020 election in which Biden received 7 million more votes than Trump, making it the 2nd least close election (since 2000) according PV margin. However, the winner is decided by the Electoral College, and Trump would have won if he had received 76,518 more votes in select states, making it the 2nd closest election according to cushion.

Caveats, Further Reading, Acknowledgements

In the analysis, we supposed that the first place candidate in a state gets all of its electors. This is not actually true in Maine and Nebraska where the process is a bit more complicated. There is also the issue of faithless electors.

The election closeness problem considered here is very similar to some previous works. For example, see Alexander S. Belenky’s paper, paper, and book. Around 2008, Mike Sheppard also considered essentially the same problem, but asked how many voters would need to change their votes (which is twice as effective as increasing voter turnout). Unfortunately, Sheppard’s website, which covered elections from 1789 to 2008, is no longer available (archive here). The work did got some press (ABC, WSJ, WSJ), including a blog post from OR’s own Michael Trick!

My PhD student Mohammad Javad Naderi collected the data for this post from Wikipedia and wrote the initial Python codes. Thanks to Paul Rubin for suggesting the word cushion.

To deal with these issues, reformers have sought to change who controls the redistricting process, or what rules redistricting plans should follow. Constraints on traditional redistricting principles (e.g., population balance, contiguity, compactness, preservation of counties or other political subdivisions) are thought to prevent the most egregious of gerrymanders.

But, at what point do these constraints make redistricting impossible?

Here, we consider one of the most basic questions in redistricting: Can my state draw its congressional districts so that:

For example, Oklahoma has 77 counties that should be partitioned into k=5 congressional districts. After the 2010 census, Oklahoma’s population was 3,751,351, so each district should have roughly 750,270.2 people in it. We will allow a little flexibility and require each district to contain between L=746,519 and U=754,021. This ensures that the most and least populated districts will differ by at most 1%. The total number of county adjacencies is 195. An example is Oklahoma County and Cleveland County.

We start with a classical optimization model that goes back to the 1960s. It will enforce the most basic constraints in redistricting. It uses decision variables of the form:

x(i,j) = should county i be assigned to the district centered at county j?

If so, then x(i,j) should equal one. Otherwise, it will be zero. The MIP is given by:

Constraints (1b) ensure that each county i is assigned to one district. Constraint (1c) ensures that there will be k districts. Constraints (1d) ensure that each district has population between L and U. Constraints (1e) ensure that counties can only be assigned to a district that exists. Constraints (1f) ensure either that county i is assigned to county j or that it is not. The objective function (1a) that is being minimized seeks “compact” districting plans. Ultimately, this will be unimportant for us, as we’re only interested in determining whether there is a feasible districting plan, not whether it is “compact”.

For computational niceties, we require each district to be “centered” at its most populous county. This can be achieved by fixing x(i,j)=0 when county i is more populous than county j. This changes the objective value of model (1), but not its feasibility status.

Initial Results

We see that most states cannot redistrict at the county level. A primary reason is large cities. For example, Dallas County in Texas had a population of 2,368,139 in 2010, which far exceeds the bound U that we imposed. We call such instances “overtly infeasible”.

There are also seven states that are “trivial” in the sense that only one congressional district should be created and it must cover the whole state. There is nothing to solve.

There are 16 states left. When running the code, we see some problems. For example, here is a solution for Oklahoma:

Obviously, this districting plan lacks contiguity. This should not be a surprise; we never imposed contiguity constraints in our MIP!

Adding Contiguity Constraints

There are many ways to add contiguity constraints to our MIP. In the example python code, we use the simplest approach, which is based on flow techniques. Intuitively, the idea is to create flow at each district’s center and send it along the district’s edges to fuel the district’s other nodes. For the details, see model (2) here.

With these constraints, our issue for Oklahoma has been fixed.

Our computational results tell us that there is no solution for Colorado, New Hampshire, Oregon, or South Carolina.

By looking at the Denver and Portland metros below, can you see why?

The other twelve states *do* admit contiguous county-level solutions: Alabama, Arkansas, Iowa, Idaho, Kansas, Louisiana, Maine, Mississippi, Nebraska, New Mexico, Oklahoma, and West Virginia. Try to draw some of them for yourself at districtr.org.

Conclusion

Only two states—Iowa and West Virginia—drew county-level maps after the 2010 census (if we exclude states with one congressional district). Based on the analysis conducted here, at most 10 other states could follow.

I emphasize “at most”, because our analysis does not consider all federal and state laws. For example, Section 2 of the Voting Rights Act (as interpreted by the Supreme Court in Thornburg v. Gingles) requires the creation of minority opportunity (or “majority-minority”) districts in certain cases. Preliminary analysis suggests that, when this constraint is considered, Alabama cannot redistrict at the county level.

My coauthors Hamidreza Validi, Eugene Lykhovyd, and I wrote a paper about handling contiguity constraints in redistricting problems when the units are census tracts instead of counties. These instances are much larger than the county-level instances considered here and require alternative MIPs and advanced techniques, like branch-and-cut algorithms and Lagrangian-based reduced-cost fixing, to solve in a reasonable amount of time. Here are links to that paper and code.

Recently, my coauthor Jose L. Walteros and I published a paper with the title “Why is maximum clique often easy in practice?” (forthcoming at Operations Research). Here is the abstract:

To this day, the maximum clique problem remains a computationally challenging problem. Indeed, despite researchers’ best efforts, there exist unsolved benchmark instances with one thousand vertices. However, relatively simple algorithms solve real-life instances with millions of vertices in a few seconds. Why is this the case? Why is the problem apparently so easy in many naturally occurring networks? In this paper, we provide an explanation. First, we observe that the graph’s clique number is very near to the graph’s degeneracy in most real-life instances. This observation motivates a main contribution of this paper, which is an algorithm for the maximum clique problem that runs in time polynomial in the size of the graph, but exponential in the gap between the clique number and its degeneracy-based upper bound . When this gap can be treated as a constant, as is often the case for real-life graphs, the proposed algorithm runs in time . This provides a rigorous explanation for the apparent easiness of these instances despite the intractability of the problem in the worst case. Further, our implementation of the proposed algorithm is actually practical—competitive with the best approaches from the literature.

The code is available here. A preprint of the paper is here. A slide deck is here.

Last Friday, I presented the work to a group at OSU. Here is the handout I provided to students.

A couple of years ago, I came across the following passage about graph notation.

…Martin Grötschel will always advocate a thorough investigation with a careful look at details, which can or cannot make a big difference. In particular, when it comes to graphs, digraphs, and hypergraphs, he becomes furious about “sloppy notation” that doesn’t distinguish between uv, (u,v), and {u,v}, because results do not automatically carry over between these cases. Martin Grötschel is correct, and the ability to seamlessly shift his attention from little details to the grand picture is without doubt one of his greatest strengths.

Preach.

In many research papers, I’ve seen authors write about a simple undirected graph with vertex set and edge set and proceed to talk about its edges . Sometimes, papers will even say that is the edge set of a complete (simple) graph. I know, because I was guilty of some of these mistakes in my first few papers.

Here are a few of the problems.

simple undirected graph

What’s the problem here? Simple graphs are, by definition, undirected. There is no need to specify “undirected.”

edge set

What about here? Undirected graphs have undirected edges which are unordered pairs of vertices (so and refer to the same edge), while the Cartesian product of sets and is ordered:

.

So, when is a directed graph, it makes sense to say that because it may have edge but not edge . But where does the edge set live when is undirected? The following approach is unsightly and not conducive to frequent use:

.

Instead, I have started writing that , where

.

The idea easily generalizes to for hypergraphs with vertices per hyperedge.

is the edge set of a complete (simple) graph

By now, the problems with this statement should be clear. One nice thing about the notation is that when the graph is simple and complete we can just say that .

Equality systems : Gaussian elimination :: Inequality systems : ???

According to Kipp Martin, the answer is Fourier-Motzkin Elimination (FME) which allows one to project out variables from a system of linear inequalities. These slides from Thomas Rothvoss nicely illustrate what it means to project out (or eliminate) the variable .

Note: I don’t particularly like how Gaussian elimination is described (via row operations on a tableau, much like how the simplex method is often described). The intuition is lacking. Instead, everything I describe in this post will work on the (in)equalities themselves.

Eliminating variables in equality systems

Start with the linear equality system

.

Proceed by isolating the variable in each equation that contains to get

.

We can eliminate variable by setting all “definitions” of equal to each other (and leaving other equations as is)

.

Simplifying gives

.

By isolating and then eliminating variable , we get . Back-substitution gives

.

So the only feasible solution is . Observe that each time a variable is eliminated, one equality is also removed.

Eliminating variables in inequality systems

Suppose we started with the inequality system

.

Isolating (where possible) gives

.

Matching each of the lower bounds on with each of its upper bounds (and leaving other inequalities as is), we obtain

.

Simplifying gives

.

Isolating gives

.

Applying the same procedure to eliminate , we are left with

.

This is the projection of the original feasible set onto the space of the variable. That is,

for any there exist values for and for which the original system is satisfied, and

any point that is feasible for the original system satisfies .

Consequences of Fourier-Motzkin Elimination (FME)

FME can test whether a linear program is feasible.

If the system is feasible, FME can provide a feasible solution by employing back-substitution (in accordance with the lower and upper bounds on the variable that is being eliminated). On the other hand, if the original system is infeasible, the final inequality system will have conflicting lower and upper bounds on the final variable such as and or obvious contradictions like .

FME can optimize linear programs.

To optimize an LP in which we are to maximize , simply add a variable and constraint to our system and make sure to eliminate last. The optimality status (e.g., infeasible, unbounded, an optimal solution) can be obtained from the final inequality system. (How?) Note, however, that Fourier-Motzkin elimination is awfully slow (doubly exponential) and is not a practical means to solve LPs. (But, there is also a singly exponential elimination method.)

Projections of polyhedra are also polyhedra

If we eliminate a variable from a system with inequalities, then we end up with a system of at most inequalities. So, if we start with a finite number of inequalities and variables, then any projection will have finitely many inequalities. (FME may give redundant inequalities, which is okay.)

Projections of rational polyhedra are also rational

If every coefficient and right-hand-side in our original inequality system is rational, then they will remain rational after a variable is eliminated.

Other results

More consequences follow by FME, see Kipp Martin’s book. In fact, he develops the basics of LP (e.g., Farkas’ lemma and duality) via projection (FME) and “inverse projection.”