In a very unlikely meeting of minds circa 1980 in the Fairchild Scholar program at Caltech, an American history professor, Allan Lichtman, teamed up with a Russian mathematical geophysicist and seismologist, Vladimir Isaacovich Keilis-Borok, in an effort to construct a broad model with which to predict the outcome of U.S. Presidential elections using a limited number of easily discernible parameters.
The model they developed was written up in several papers coauthored by the two of them and presented in a more popular (i.e., "without the math") series of books authored by Lichtman entitled The 13 Keys to the Presidency (or some variation thereof), which he republishes every four years or so in time for the next U.S. Presidential election with new information and observations.
Using the model, which is intended to predict the popular vote winner, Lichtman was able to correctly predict that winner in all elections from 1984 to 2012 and the actual outcome (electoral vote) from 1984 to 2012, excepting the year 2000 when Al Gore won the popular vote but lost the electoral vote. This time around the model as Lichtman applied it just before the election predicted that Donald Trump would win the popular vote. In fact, Hillary Clinton won the popular vote but Donald Trump won the election. But one of the parameters was actually misapplied -- meaning that properly interpreted, the model correctly predicted that Hillary Clinton would win the popular vote. Curious? We will explain below.
I went back to the core source material and located the orginal paper entitled "Pattern recognition applied to presidential elections in the United States, 1860-1980; Role of integral social, economic and political traits" published by the National Academy of Sciences in 1981. Lichtman and Keilis-Borok also co-authored a similar but more refined paper in the early 1990s and presented it as a chapter in the book "Limitations of Predictability" published in 1993 and edited by Yurii Kravtsov. The book "addresses the problem of predictability of various phenomena, both of physical origin (such as weather, climate, earthquakes, biological media, and dynamical chaos) and of a social nature (election preferences, laws of ethnogenesis, and so on)." But for today, we are just going to focus on the chapter about U.S. elections.
The model is built on the idea that presidential elections are essentially referenda on the party who currently holds the presidency and identifies thirteen binary parameters to consider, which are presented in the form of true/false statements:
The model requires the user to make a true/false assessment of each of these thirteen parameters. It is notable that only three are purely objective (factors 1, 3 and 6), although most of them are probably not very difficult for even a casual observer to assess in most election years. The other interesting aspect of these parameters is that only one -- factor 13 -- has anything to do with the challenging party. Parameter 2 was found to be the most predictive on its own.
It is also notable that the original model from the 1981 paper was slightly different. First, it had only twelve parameters -- parameters 10 and 11 regarding foreign policy were notably absent. Second, parameter 1 actually replaces these two 1981 parameters: "The incumbent party has been in office more than a single term;" and "The incumbent party gained more than 50% of the vote cast in the previous election." I was unable to determine at what point in the 1980s that these changes to the model were made. However, a number of alternative factors were toyed with in 1981 according to the paper, including whether the incumbent was a Republican or Democrat (meaningless, as we would expect from a view through the Mimetic Lens that they are simply mirrored rivals), whether there was a serious contest for the challenging nomination, whether the election occurred during wartime, whether foreign policy issues were dominant and whether domestic policy issues were dominant. The original 1981 model was accurate 19 of 21 times.
The rule of application is straightforward. If five or fewer of the statements are found to be false, the model predicts that the incumbent party's candidate will win, where victory is defined as winning a plurality of the popular vote. If six or more are found to be false, the model predicts that the challenging party's candidate will win a plurality of the popular vote.
Applying the model to historical sets of facts, Professor Lichtman showed the model would have accurately forecast the plurality of the vote victor in all of the elections from the birth of the modern two-party system in 1860 through the 1980 election. In two elections though, in 1876 and 1888, the plurality vote-winner did not win the electoral college, and so did not become the president.
Lichtman and Keilis-Borok constructed and validated their model through pattern recognition algorithms involving a Hamming distance calculation and the CORA-3 algorithm, which are methods used in information theory and earthquake forecasting. We won't go into those here, but you can see some of that work in the original paper and Keilis-Borok also later wrote a another paper describing these methods in more gory detail, Pattern Recognition Methods & Algorithms, that was presented as part of the Ninth Workshop on Non-linear Dynamics and Earthquake Predictions in 2007. They also used a "seismic history" analysis to validate the model, but recognized that no statistical analysis could be done because there were too few data points to analyze. As they explain:
"The logic and algorithms of the present analysis follow Gelfand's school of pattern recognition analysis. Specifically, we draw upon the experience of earthquake prediction research. . . .[W]e sought to predict the winners of elections, but not their percentage of the vote. Likewise, we eliminated certain information through the discretization of parameters to the lowest level of resolution, 'yes' or 'no'. Similar 'robust' methods are widely used in the heuristic analysis of complicated data, especially when dealing with small samples. The apparent loss of information involved has often produced the stable results that elude more 'detailed' analyses that are subject to fluctuations in the values of particular variables."
Looking now through the Lenses of Wisdom and the Fractal Lens in particular, we see that this election model is based on some of the core ideas of complex systems that apply to phenomena such as earthquakes. We had previously considered the sand-pile complexity model as applied to political parties generally:
"[W]e can envision and model the two political parties as the sand-pile models of complexity theory. Each party builds its sand-pile slowly and carefully, building issues upon issues and attracting supporters. But eventually the piles become too high, the positions to brittle or outdated and the supporters die off, leaving fragile structures that are prone to collapse as more grains of sand are dropped. Alternative candidates represent grains of sand that might fall away harmlessly off the pile or might cause avalanches. Changes do not come gradually or in an orderly manner, but in fits and starts."
The Lichtman/Borok-Keilis model could be thought of as an attempt to identify critical fault-lines in a sand-pile that represents the incumbent party's presidential hopes. Each of the first twelve keys is a potential crack or instability in the incumbent pile. The challenging party is entitled to drop one or a few grains of sand, with the size of that grain or grains determined by the charisma of their candidate -- and nothing else. If there are more than five fault lines in the incumbent pile -- or only four and an especially large grain or grains of sand -- the pile usually collapses when the grain is dropped and the challenger wins. Otherwise, the structures hold and the incumbent party wins.
The concept from complexity theory of emergence or self organization is also present. As the authors describe it:
"Self-organization and Predictability. In the natural world, intricate chaotic systems, after appropriate smoothing, often display stable regularities, including predictability. These regularities are difficult, if not impossible, to derive from the behavior of the system's elementary components. Our results for the American political system suggest that American society comprises such a system during presidential and senatorial elections. The hierarchic system of American electoral groups has stable and predictable aggregate-level behavior with a high degree of integration for the entire nation and even for individual states.
This integration occurs despite the contradictory interests and outlooks of electoral groups. The laws governing the outcome of elections have remained stable at the aggregate level from 1860 through 1988, even though three-fourths of today's voters - women, 18 to 20-year-olds, African-Americans, and the great majority of descendants from Latin America, Asia, and Eastern and Southern Europe - were not part of the nineteenth-century electorate. Electoral systems may thus display features similar to large-scale physical systems that likewise exhibit collective behavior comprehensible only at the level of the system as a whole."
But query why this model works at all -- it seems to take all of the strategy out of presidential politics and reduce it to a few banal factors, a number of which the parties or candidates cannot even control, or control only in part. To consider why it works as well as it does, we will take a look through the Prospecting Lens as to how we are likely to be deceiving ourselves as the drama of presidential politics unfolds in each cycle.
The first thing we see in that view is that presidential campaigns are based on competing narratives, and more importantly, the "history" of why a particular candidate won is constructed after-the-fact from one or more of those competing narratives. This evokes two of the System 1 heuristics in particular: The Narrative Fallacy and The Hindsight Illusion. From the Prospecting Lens page:
THE NARRATIVE FALLACY. In our continuous attempt to make sense of the world we often create flawed explanatory stories of the past that shape our views of the world and expectations of the future. We assign larger roles to talent, stupidity, and intentions than to luck. This is most evident when we hear, “I knew that was going to happen!” Which leads to…
THE HINDSIGHT ILLUSION. We think we understand the past, which implies the future should be knowable, but in fact we understand the past less than we believe we do. Our intuitions and premonitions feel more true after the fact. Once an event takes place we forget what we believed prior to that event, before we changed our minds. Prior to 2008 financial pundits predicted a stock market crash but they did not know it. Knowing means showing something to be true. Prior to 2008 no one could show that a crash was true because it hadn’t happened yet. But after it happened their hunches were retooled and become proofs. “The tendency to revise the history of one’s beliefs in light of what actually happened produces a robust cognitive illusion,” Potential for error: “We are prone to blame decision makers for good decisions that worked out badly and to give them too little credit for successful moves that appear obvious only after the fact. When the outcomes are bad, the clients often blame their agents for not seeing the handwriting on the wall—forgetting that it was written in invisible ink that became legible only afterward. Actions that seemed prudent in foresight can look irresponsibly negligent in hindsight."
In fact, the Lichtman/Borok-Keilis model was designed to challenge both of these fallacies and provide a more rational System 2 model in their place. In their 1993 paper they summarize the typical narrative of how U.S. Presidential elections supposedly work:
"American elections are often summarized as follows:
Contrasting their analysis to the typical "after-the-fact" narrative, Lichtman and Borok-Keilis write:
"Our findings, for example, suggest an explanation of George Bush's 1988 victory that is radically different from the generally accepted version of events. According to the conventional wisdom, after trailing by as many as 17 percentage points in the polls, Bush began a remarkable 'comeback' with his eloquent convention speech (primarily crafted by master speechwriter Peggy Noonan). He then launched a devastating barrage of negative attacks on Mike Dukakis, orchestrated by political adviser Lee Atwater and designed by advertising expert Roger Ailes. When Dukakis failed to respond to charges that he furloughed dangerous criminals and fouled Boston Harbor, Bush surged permanently ahead. Thus, a brilliantly designed - if shallow and vicious - campaign allegedly changed the minds of the voters.
Our conclusions compel a different version of what actually happened in 1988. Based on the record of the previous four years, as measured by the 13 parameters, a Bush victory was apparent long before the public ever heard of speechwriter Peggy Noonan or furloughed rapist Willie Horton.
Six months prior to the election and three months before Bush's alleged comeback, the following forecast was published: "Barring a suddenly stalled economy and a major disaster between now and election day, George Bush is a shoo-in for the presidency, no matter who winds up as the Democratic nominee."
In effect, they are applying a System 2 kind of analysis in a mechanical way. Which leads us to two other System 1 heuristics that most election forecasters stumbled over this time around: Ignoring Algorithms and Trusting Expert Intuition. Summarizing from the Prospecting Lens page:
IGNORING ALGORITHMS. We overlook statistical information and favor our gut feelings. Not good! Forecasting, predicting the future of stocks, diseases, car accidents, and weather should not be influenced by intuition but they often are. And intuition is often wrong. We do well to consult check lists, statistics, and numerical records and not rely on subjective feelings, hunches, or intuition. Potential for error: “relying on intuitive judgments for important decisions if an algorithm is available that will make fewer mistakes."
TRUSTING EXPERT INTUITION. “We are confident when the story we tell ourselves comes easily to mind, with no contradiction and no competing scenario. But ease and coherence do not guarantee that a belief held with confidence is true. The associative machine is set to suppress doubt and to evoke ideas and information that are compatible with the currently dominant story." Kahneman is skeptical of experts because they often overlook what they do not know. Kahneman trusts experts when two conditions are met: the expert is in an environment that is sufficiently regular to be predictable and the expert has learned these regularities through prolonged practice. Potential for error: being mislead by “experts.”
Here, we can see where most forecasters of this election ran awry. Many pundits intuitively predicted that Clinton would win the election quite easily given her relative experience in government over Trump and Trump's personality, and misread the polling data in support of that conclusion. Thus, a number of experts were giving Clinton a 90-99% chance of prevailing. In reality, the polls were saying the election would be close, with the final national polls only giving Clinton only a 3.3% advantage in the popular vote according to the Real Clear Politics aggregator. The final result in the popular vote was Clinton prevailing by +1.9%, which was not very far off in the aggregate. But presidential elections are ultimately decided state-by-state. In the states that ultimately made the difference in the electoral counts, the polling was similarly close and had a greater margin for error built in. Thus, in Pennsylvania, the polls showed Clinton +1.9% and the actual result was Trump +1.1%; and in Michigan, the polls showed Clinton +3.3% and the actual result was Trump +0.2%. None of this really suggested a 90-99% probability of a Clinton victory.
Yet the simple, seemingly blind algorithm constructed by Lichtman and Keilis-Borok held up quite well -- without having to rely on the fog of polling data, the weekly gaffes of and revelations about the candidates, or System 1 narratives from pundits. Usually, Professor Lichtman is able to make a forecast based on the model even years out, particularly if an incumbent is running.
In this case, he did not attempt a forecast until late September 2016. In an interview with the Washington Post on September 23, the Professor noted that five model parameters were already clearly against the Democrats: "Key 1 is the party mandate — how well they did in the midterms. They got crushed. Key number 3 is, the sitting president is not running. Key number 7, no major policy change in Obama's second term like the Affordable Care Act. Key number 11, no major smashing foreign policy success. And Key number 12, Hillary Clinton is not a Franklin Roosevelt (i.e., is not charismatic)." The one parameter he was having difficulty with was number 4 -- whether there would be a significant vote for a third party candidate. Professor Lichtman is usually looking for 5% or more on that, and at the time Gary Johnson was polling over 10%. So based on that, he felt that the model narrowly favored Trump and went with that as his prediction. He was asked again about his prediction on November 3 and felt that it should hold up given that Johson still appeared to be a "significant third party or independent campaign." Yet at the end of October, he still wondered aloud whether the model would hold up due to the unique nature of the Trump candidacy:
"Donald Trump’s severe and unprecedented problems bragging about sexual assault and then having 10 or more women coming out and saying, “Yes, that’s exactly what you did” — this is without precedent. But it didn’t change a key.
By the narrowest of possible margins, the keys still point to a Trump victory. However, there are two major qualifications. And I’m not a hedger, and I’ve never qualified before in 30 years of predictions.
Qualification number one: It takes six keys to count the party in power out, and they have exactly six keys. And one key could still flip, as I recognized last time — the third party key, that requires Gary Johnson to get at least five percent of the popular vote. He could slip below that, which would shift the prediction.
The second qualification is Donald Trump. We have never seen someone who is broadly regarded as a history-shattering, precedent-making, dangerous candidate who could change the patterns of history that have prevailed since the election of Abraham Lincoln in 1860.
I do think this election has the potential to shatter the normal boundaries of American politics and reset everything, including, perhaps, reset the keys to the White House. Look, I’m not a psychic. I don’t look at a crystal ball. The keys are based on history. And they’re based on a lot of changes in history, they’re very robust. But there can come a time when change is so cataclysmic that it changes the fundamentals of how we do our politics, and this election has the potential — we don’t know yet, but it has the potential."
In fact, the application of the model and the results were a bit paradoxical and Professor Lichtman's human error played a role. First, from a 30-foot level, you could say that Lichtman's prediction that Trump would win was correct. Trump did win the electoral college and so won the presidency. Yet, looking closer, you would say that Lichtman and his model were actually wrong: The model, by its own terms, is supposed to be predicting who wins the popular vote, not the electoral vote. Since Clinton won the popular vote and Lichtman predicted it would be Trump, you would have to say he was wrong.
But now lets look even closer at how Lichtman applied the model: In his application, Lichtman assumed that Gary Johnson would get at least 5% of the vote. In fact, Johnson only received 3.3% of the vote (and even if you added Jill Stein's votes, it was still less than 5%). So that parameter should have been flipped, meaning that the model -- properly applied without human error -- actually predicted that the Democrats would win the popular vote. And indeed they did. I'm not sure Professor Lichtman yet appreciates that his model out-performed him this time around.
I am suddenly reminded of this twisted back-and-forth analysis of a binary outcome game portrayed in this scene from The Princess Bride:
But I digress.
So where do we end up on predicting elections with algorthms? Probably (A) they are better than humans, but (B) the are not infallible, especially in a winner-take-all, state-by-state voting system that has a higher variance than the popular vote outcome. Our rational, System 2 minds remind us that we should not run afoul of this System 1 heuristic:
OVERLOOKING LUCK. Most people love to attach causal interpretations to the fluctuations of random processes. “It is a mathematically inevitable consequence of the fact that luck played a role in the outcome….Not a very satisfactory theory—we would all prefer a causal account—but that is all there is,” . . . “Our mind is strongly biased toward causal explanations and does not deal well with ‘mere statistics,’” Potential for error: seeing causes that don’t exist.
Or in this case, looking at only one or two data points or events as the "cause". Think about this next time someone tries to convince that "this one event" caused the outcome of the election. It's probably not true, especially if it was gaffe, revelation or ploy during the heat of the campaign.
Or as my primary interviewer told me when interviewing for my first career position over a quarter century ago: "Sometimes its better to be lucky than good."
Which leads us to one final observation from the Lichtman/Keilis-Borok model: being "good" helps, too. Winning House seats at the local level in midterm elections matters for one. More to the point here, the candidates in this presidential election were two of the least liked in all of the history of presidential politics. They not only did not have the kind of charisma that would tick parameters 12 and 13 of the model in their side's favor, they even seemed to score negatively if that were possible. This means that if either side would have put up a better or more charismatic candidate, they may have been able to the tide more in their favor. And then maybe the outcome would not have been "as much" up to the random factors that may or may not operate in your favor (as Mr. Spock defines "luck").
I have always been curious about the way the world works and the most elegant ideas for describing and explaining it. I think I have found three of them.
I was very fond of James Burke's Connections series that explored interesting intersections between ideas, and hope to create some of that magic here.