Expected Stolen Bases Model
Summary
Model the probability of success for a stolen base attempt based on player and ball tracking and game state data. Separate models are trained for each base (attempts to steal 2B and attempts to steal 3B) to evaluate the probability from the perspective of the runner, the pitcher, and the catcher. The model is not used for stolen base attempts with pickoffs or catcher backpicks, or for stolen base attempts when there is a runner ahead of the runner in question (e.g., an attempt to steal 2B when there is already a runner on 2B or 3B) to avoid complications of double steals.
Inputs
This model is based on steal attempts (excluding those with catcher backpick attempts) from the 2016-2023 MLB regular season. We currently have separate models trained on different groups of seasons to better capture the effects of changes in tracking technology.
Steal time - Statcast
Primary and secondary lead - Statcast
Sprint speed - Statcast
Steal first step - Statcast
Pitch release time - Statcast
Pitch velocity
Was it a pitch out?
Some values not reported by Statcast on a given stolen base attempt can be estimated based on other values if all exist for that SBA. For example, when the primary lead is unavailable we estimate it from the secondary lead when it is available.
Key Metrics
x2B%/x3B% - Expected SB2/SB3 rate based on model output
xSB2AA/xSB3AA - SB2/SB3 above expected average - 0 is neutral
xSB2+/xSB3+ - SB2/SB3 above expected average - 100% is neutral
CS2AA/CS3AA - CS2/CS3 above expected average - 0 is neutral
CS2+/CS3+ - CS2/CS3 above expected average - 100% is neutral
The SB stats are most useful from the baserunner perspective and the CS stats are most useful from the catcher and pitcher perspectives. The baserunner perspective is based only on the runner skill factors (e.g., lead distance, steal time) and does not take into account the pitch factors. The pitcher perspective takes into account the pitch factors and the runner lead. The catcher perspective takes into account the baserunner and pitch factors.