What is "Win Probability?"
Well, it's pretty simple. Given a certain situation in a game, it is the chance that a team has of winning the game.
What are the main factors?
Keeping it simple, there are a few core factors in determining a team's chance of winning the game.
The first factor is time. How much time remains in the game ultimately determines how long a
team has to change the outcome. The second main factor is possession. The team that has the
ball has at a given time in the game obviously has an advantage. The third and most important factor is the score.
Basic intuition is correct in that a team with big lead has a greater chance of winning than a team with a small lead. The fourth
factor is location. The home team typically has an advantage historically. There is a definitely a different between being at home
and on the road. The final factor is the field position of the football. Being on the opponent's half of the field versus your own half of the fields is a significant difference.
This seems really complex mathematically. How do you calculate it?
Well, with a lot of data. We took a large chunk of our play-by-play data (1.4M plays over 9 years) and recorded the score on every
play of every game, the time left in the game, who had the ball, and most importantly, the team that ended up winning the game.
From there, we built this really simple calculator that says, given this score, based on the data, what number of other games had
this score (Total Game Occurances), and how many of the teams won the game. With the total number of games and the number of times the team won,
the Win Probability is easy to calulate (100 x Wins / Total Game Occurances).
Is it really that simple?
For the most part, yes. A lot of other models out there cannot be explained easily and are guilty (mathematically) of trying to boil the ocean. Our goal is to
have something that is easily consumable and comprehensible.
Are there any flaws in this approach?
Famous statistician George E. P. Box once said "All models are wrong, but some are useful". That being said, there are a few
flaws to this approach (as there are with any approach). Notably, there isn't enough historical data for neutral site games. Additionally,
scores that are not very common (i.e. 21-3 with 14:00 left in the Q1) do not have a lot have large enough sample sizes to calculate an accurate win probability. We put
a threshold in to inform the user when that is the case. Other than that, the numbers make a lot of sense, pass the sniff test, and we hope
this tool provides a fair amount of usefulness.