Politics by the Numbers: Poll-Based Forecasting Models

In my previous post I showed that individual House election outcomes tracked fairly well with district-level poll results. The implication of this is that district-level polls can be used to forecast House outcomes at the district level.

To get a better sense of this I've generated out-of-sample (OOS) election predictions for 2006 and 2008 based on district-level poll results. So, for instance, I used OLS estimates from the relationship between the 2006 polling averages (over the last forty-five days of the campaign) and the actual 2006 House outcomes to predict the 2008 outcomes, based on the 2008 polling values. For 2006 I used estimates from the relationship between polls and votes in 2008 to do a backwards prediction (a "postdiction," I suppose) of the 2006 outcomes. The idea here is to see how well outcomes in one year can be predicted using parameter estimates from another year.

Here's how closely the OOS predicted values followed the actual House election outcomes for 2006 and 2008, pooled:

The correlation between predicted and actual outcomes is .89, and 83% of all outcomes (win/lose) are called correctly (using just the point estimate), compared to just 52% using the modal outcome. The important thing to remember here is that this figure illustrates the accuracy of predictions for outcomes in one year based on the relationship between polls and votes in another year.

The same method is used for Senate elections from 2006 and 2008 (pooled), with similar but stronger results:

In this case the correlation between the OOS predictions and actual votes is .97, and in only one out of 63 contests was the outcome (win/lose) called incorrectly. To be sure, in both the Senate and House contests a lot of the predicted votes hover right around 50% and the contests could easily tip one way or another. But using the simple point estimates generates pretty fair predictions of winners and losers.

So, what does this tell us about the 2010 outcomes? It's not very complicated. Generally speaking those who maintain a lead in the polls during the fall campaign generally go on to win. This is especially clear in Senate races.

Estimates based on regression models of the relationship between polls and outcomes from other election years (2006 and 2008) should give us a pretty good basis for predicting the 2010 outcomes. This is a very simple approach and may suffer somewhat from not taking into account a whole host of factors that might, for instance, reflect expected accuracy of the polling organizations, or district-specific characteristics that may play an important role in 2010. But that's sort of the point--to provide a quick and simple way to predict outcomes.

Toward that end I've posted forecasting boxes on the right side column of the blog. Right now, I have the forecasts for Senate elections and will post the House forecasts as soon as I'm able to gather the existing 2010 district-level polls (hey, if you've already gathered these data I'd be glad to borrow them from you!).

Important note: these forecasts are based on current information and will change as new data come in. Be sure to stop back and take a look every once in a while.

Monday, September 27, 2010

Poll-Based Forecasting Models