Wednesday, October 8, 2014

Notes on the expanded three-way simulator

In simulating elections, we've so far been able to mostly ignore third-party candidates in determining the chances of victory--they've been so far behind that there isn't any need to calculate their probability of victory because it's essentially zero. In R code (the language under which our simulations are run) the relevant section of the simulator looks like this:
for(i in 1:100000) {
...
        if(D > R) {
              result <- 1 ## Democrat wins
        } else {
              result <- 0 ## Republican wins
        }

        cumulative <- cumulative + result
    }
    cumulative / 1000 ## Democratic win percentage

For those unfamiliar with this process, this type of simulation is known as the Monte Carlo method. Essentially what this does is take some previously simulated values of D and R (the Democratic and Republican projected vote share based on a normal distribution, which is all done in the area where the ellipses are) and test if the Democrat's vote share is greater than the Republican's. If it is, it adds 1 to the Democrat's total number of victories so far (which is the object cumulative) and returns to the front of the loop, where new values of D and R are generated and the process is repeated. This is repeated 100,000 times. With each iteration the simulator tallies up the number of victories for the Democrat; dividing by 1000 will then give the percentage of simulated victories won by the Democrat.

The situation gets more complicated when you add in a strong third-party candidate with a chance of winning the election, however. In particular, the Senate race in South Dakota was the main reason I expanded the simulator. We couldn't get away with just ignoring one of the three candidates--they were all close enough that any one of them had a non-zero chance of winning the election.

One way to get around this in the simulator would be to select one candidate against whom to match up the others in head-to-heads. Suppose we took the candidate to be Republican Mike Rounds, and used our polling average to calculate the percentage of victories Democrat Rick Weiland scores against him. Then we do the same for a Larry Pressler-Rounds matchup and write down the probability. Whatever's left must be the chance Rounds has of winning the election. When we do that, our simulator spits out a 10% chance of victory for Weiland, a 21% chance of victory for Pressler, and the remaining 69% must belong to Rounds (we can ignore independent Gordon Howie because he polls so far behind that his chances are essentially nil). 

But it's not that simple. We could turn around and use a different candidate for the matchups--Pressler, for example. Then we could test a Weiland-Pressler matchup and a Rounds-Pressler matchup. The Rounds-Pressler matchup yields the same 21% victory chance for Pressler (read it as a 79% chance victory for Rounds), but Weiland's chance of winning is now 35%. 79% + 35% is obviously greater than 100%, so what's going on?

The problem with the simulation method above is that it counts as a victory any time one candidate beats another specified candidate. But that's not how the elections work, is it? In the actual election, you have to beat all of the other candidates to win--it won't cut it for Weiland to beat Pressler alone (which, as said, he has a 35% chance of doing); he needs to beat Rounds as well. The problem with the above simulator is that it doesn't pit Weiland against Rounds. It's easier for Weiland to beat either one of the candidates than to beat both of them at the same time; that's how probability works. 

Because of this, the expanded simulator has to include a logic operator that requires a candidate's simulated vote share to be greater than both of his opponent's simulated vote shares--and it must do the same for both of the other candidates. Here's the expanded section of code, analogous to the one above:
for(i in 1:100000) {
...
          if(D > R & D > I) {
              Dresult <- 1 ## Democrat wins
              Rresult <- 0 ## Republican loses
              Iresult <- 0 ## Independent loses
          } else if(R > I & R > D) {
              Dresult <- 0 ## Democrat loses
              Rresult <- 1 ## Republican wins
              Iresult <- 0 ## Independent loses
          } else {
              Dresult <- 0 ## Democrat loses
              Rresult <- 0 ## Republican loses
              Iresult <- 1 ## Independent wins
       }
   

       cumulativeD <- cumulativeD + Dresult
      
cumulativeR <- cumulativeR + Rresult

       cumulativeI <- cumulativeI + Iresult
    }

   Dprob <- cumulativeD / 1000 ## Democratic win percentage
   Rprob <- cumulativeR / 1000 ## Republican win percentage
   Iprob <- cumulativeI / 1000 ## Independent win percentage

   Probabilities <- c(Dprob,Rprob,Iprob)
   Probabilities
The principle here is the same: generate vote shares (only there are three this time) and stack them up against each other, then tally up the victories for each candidate across 100,000 simulations. In this simulator, first it compares the Democrat's vote share to both the Republican and the independent. If it's greater than both, great; it sets the Democratic result to 1, skips the rest, and adds it to the cumulative number of Democratic victories. If it is less than or equal to either one of the candidates' shares, the simulator continues on to the next condition. In the second condition, the simulator compares the Republican vote share to the Democrat's and the independent's. Now if the Republican receives a greater vote share than either the Democrat or the independent, it chalks up a 1 for the Republican result, skips the rest, and adds it to the cumulative number of Republican victories. (Implicit in this is that it adds the Democratic result and the independent result to their respective cumulative values; it's just that the results are equal to zero so it has no effect.)

Finally, if neither of those conditions are met, it should be clear that the only option left is that the independent received a greater vote share than either the Democrat or the Republican. That's an independent win, and it's the last condition--the independent's result is set to 1 (the Democrat's and the Republican's are set to 0) and added to the cumulative result. Lather, rinse, and repeat a hundred thousand times, and divide by 1000 to get percentages, which are summarized in the Probabilities object, which is just a vector that displays probabilities for all three candidates. For example, when I last ran the South Dakota Senate numbers, the final display looked like this:
[1]  0.375 88.688 10.937 
indicating, from left to right, that Rick Weiland won less than 1% of the simulated elections, Mike Rounds won 89% of them, and Larry Pressler won 11%.

No comments:

Post a Comment