Using machine learning to predict strategic infield positioning using statcast data and contextual feature engineering.
A Statcast Tribute to Baseball's Strangest Pitch: the Eephus
I’ve been borderline obsessed with the eephus for some time now. Every time I see a player pull this pitch out of their arsenal I become equal parts excited and bamboozled. My reaction is typically equal parts “I could throw that,” and “how on earth didn’t he hit that?”
For those who aren’t familiar, here’s a quick description and history of the eephus. In short, an eephus is a blooper pitch: it has a lazy, rec-league style delivery, can arch well above the batter’s head en route to the plate, and tends to travel anywhere from 40 to 70 mph as it leaves the pitcher’s hand. It’s oftentimes difficult to tell whether it was thrown on purpose or if the pitcher temporarily forgot how to throw a baseball.
This pitch is said to have first been thrown by Bill Phillips, who made the pitch a part of his game from 1890 to 1903. The pitch was later brought to prominence by Rip Sewell roughly 40 years later, and has seen sporatic use since. This pitch has gone by a variety of names over the years, including being referred to as a “junk pitch”, “dead fish”, “LaLob”, and a “spaceball” for its high arch (source: A Brief History of the Eephus Pitch - NYTimes).
Well below the speed of an average changeup, and typically lacking any element of deception as to what’s coming in its delivery, why does anyone throw this bizarre pitch? The prevailing theory is that the comically slow speed of this pitch throws off a batter’s calibration, making the pitches that follow appear blazing fast. In other instances, people speculate that the pitch is simply a mistake, having slipped out of the pitcher’s hand. Regardless, little research has been done to date on this uncommon pitch, and I think it deserves better than that. Thus, this post is going to serve as an exploratory analysis of and tribute to the mythical eephus.
Before going any further in this post, here’s some quick suggested viewing for context on the big league pitch that you could probably throw just as effectively as Clayton Kershaw:
Now that this pitch has received a sufficient amount of hype, let’s get up close and personal with the eephus and see what it looks like by the numbers. To do this we’ll need data on every eephus that’s been thrown during the Statcast and PITCHf/x eras. For this, I used the pybaseball library to retrieve the Statcast and PITCHf/x data on every Major League pitch that’s been thrown since the 2008 season. Among these 7,212,136 observations, only 2,090 of them represent eephus pitches. That’s just 0.02 percent - a rare pitch indeed!
Eephuses thrown by season
The eephus saw its Statcast-era golden age in the year 2014, when over 400 were thrown. With the exception of the 2012 - 2015 seasons, it appears most common to see less than 200 thrown in a given year. Turning to the list of pitchers who’ve used this pitch, it becomes clear that it’s no coincidence that the 2012 - 2015 spike in eephus use coincided with the era of a healthy R.A. Dickey. This eephus-throwing knuckleballer, in fact, is responsible for more than twice as many eephus pitches as the next-most prolific user of the pitch.
Eephus count by pitcher, 2008 - 2017
In recent history, only Dickey, Padilla, Despaigne, and Chen have been prolific enough users of the pitch to have more than 100 in-game examples under their belt. It makes sense that this would be an uncommon pitch for most of those who use it; once the eephus loses its element of surprise, it’s no longer a novel and disorienting pitch, but essentially a Little League World Series-level fastball that any major league batter worth his place on a roster would hit out of the park.
Since data on any particular pitch type is only relevant in the context of other pitches, we’ll first compare the eephus against the closest things it has to peers: the fastball, knuckleball, and changeup.
The most relevant data point here is speed: the eephus has an average speed of just 64.5mph. That’s 23% slower than the average changeup, and 30% slower than the average fastball. The pitch doesn’t demonstrate the same low spin rate of other purposefully-slow pitches, however, despite slowness being its defining characteristic. While the knuckleball and changeup show spin rates in the 1500s and 1700s, the eephus spins at a lofty 2301 rpm - a solid 100rpm faster than the average fastball. As spin rate is a relatively new metric to have access to, the experts aren’t completely certain what a high or low spin rate means for pitch quality. Early research, however, suggests that high spin rate is a good thing for a non-breaking ball.
Statcast Zones (source: Baseball Savant)
The last summary stat shown in the table above is the percentage of each pitch type that’s placed down the middle of the strike zone, along its edges, and outside. Here I use the Statcast zones shown above, defining “down the middle” as being in zone 5, “edge of strike zone” as zones 1, 2, 3, 4, 6, 7, 8, and 9, and “outside strikezone” as zones 11 through 14. At a high level, the farther pitches tend to be placed from the middle of the strike zone, the more likely it is that pitchers are using this pitch for strategic reasons and the less likely it is that a pitcher is confident in the pitch’s ability to get past a batter without being expertly placed. Here we see about what we’d expect. Fastballs are placed within the strike zone relatively more often than the slow-speed changeup and eephus, with the eephus being thrown outside the strike zone two percentage points more often than the changeup and 12 percentage points more often than the fastball. This makes intuitive sense, since one can imagine that a well-prepared power hitter could do some damage to a 60mph pitch thrown down the middle. Due to the eephus’ high arch, it may be challenging to place accurately as well, which would also contribute to how often it lands outside the strike zone.
Eephus (L) and Fastball (R) Placement from Batter's View
The above figure shows this same idea in slightly more detail. While the sample size is much smaller for the eephus than the fastball, it’s clear that eephus pitchers make a concerted effort to keep this pitch well out of reach, at the expense of it often having no chance of entering the strike zone.
While summary stats are useful, a simple average never tells the full story. To better understand baseball’s slowest pitch, let’s take a look at how its release speeds are distributed relative to these other pitches.
From this figure we can see that the eephus’ slowness is even more pronounced than one may have thought! In fact, if we throw out the fastest 1% of eephus pitches which are outliers that appear to have been misclassified, we see that the remaining 99% of recorded eephus pitches are slower than 97% of recorded changeups. So while there is some overlap between the two pitches in terms of speed, the eephus is essentially in a league of its own in terms of slowness.
The speed gap between the eephus and the fastball is even more pronounced. One can imagine how disorienting it would be to see an eephus float by after a 95mph fastball, or how blazingly fast this same fastball would appear after a 60mph eephus. As a side note, the bi-modality of knuckleball speeds suggests that Statcast may be misclassifying some of these pitches as knuckleballs when they’re actually eephuses. Since there’s no accurate way of saying which declared-knuckleballs are actually eephuses, however, we’ll have to leave those pitches be.
This brings us to a more practical question: does the eephus actually work? The most salient argument for its use is the one alluded to earlier: the extreme speed differential between an eephus and any other pitch both catches batters off guard for the eephus itself, and makes a non-eephus follow-up pitch appear faster and harder to track. But does this theory hold up in practice? Let’s examine the effectiveness of the eephus vs. a few more common pitches, and then test whether an eephus actually makes the following pitch harder to hit.
For examining the effectiveness of the eephus vs. all other pitches, the following five metrics provide a nice overview of how batters fare against it: contact percentage, hit percentage, launch angle, exit velocity, and barrel percent. These metrics collectively represent how hittable the pitch is, how high quality a better’s contact with an eephus tends to be, and whether people hit the eephus for power or for contact.
First, perhaps surprisingly, batters make contact with this pitch about as often as every other pitch, making contact with the eephus just 0.33 percentage points more often than an average pitch. The quality of this contact, however, tends to be lower. Despite making contact with this slightly more often, for example, it becomes a hit almost 11% less often. A second way of looking at this is that its barrel percent, measured as the percentage of eephus pitches with an expected batting average of above 0.500 based on the ball’s speed and angle off the bat, is a tenth of a percentage point lower for eephus pitches, amounting to a 2% drop. This isn’t a large decrease, but paired with the pitch’s higher contact percent and lower hit percent, it paints a picture of frequent but low-quality contact.
Barrel percent is calculated using the ball’s exit velocity and launch angle off the bat, but these factors can be examined in isolation as well to better understand what type of contact is being made. Here both the average and distribution of these metrics show that batters’ launch angles are about the same for an eephus vs. non-eephus pitch, but the speed of the ball off their bat is slower. This is reflected by the ball’s average exit velocity being 4.29mph slower and the distribution of this metric being shifted noticeably toward the slower side for the eephus vs. every other pitch.
Now that we’ve established that the eephus itself may have the desirable quality of drawing out low-quality contact, let’s return to the theory posed earlier: is a fastball harder to hit if it’s thrown after an eephus? Do pitchers strategically throw fastballs more frequently after an eephus? These same questions could be posed for pitch types other than the fastball, but if this effect exists, this is where we’d expect it to be most pronounced, so we’ll leave the other pitches out for now. The answer to the first of these questions is a definitive “not really.” An average batter makes contact with 19.18% of fastballs thrown. When the previous pitch was an eephus, this contact percentage actually increases to 22.60%. Further, this contact tends to be high quality contact. 8.49% of eephus-preceded fastballs turned into hits, while this number is only 6.26% on average. Measuring barrels shares a similar story, where a near-average 5.4% of fastballs are barreled on average, but a much-higher 6.4% are barreled when the previous pitch was an eephus. It’s difficult to make a strong claim about the impact of an eephus on a follow-up fastball, however, due to sample size constraints. 703 post-eephus fastballs have been thrown during the PITCHf/x and Statcast eras, and only 203 of these happened since barrels became measurable in 2015. This is hardly enough data to trust these particular numbers out of sample. It appears from this analysis, however, that a fastball thrown after an eephus performs either identically or slightly better than an identical fastball under other circumstances. Based on these results, I would take any claim that a fastball is extra hard to hit after an eephus pitch with a grain of salt.
The second of these questions is easier to answer. While approximately 64% of major league pitches are fastballs, only 47% of eephuses whose plate appearance contained a follow-up pitch were followed by a fastball. Even if we remove eephus-throwing knuckleballer R.A. Dickey from this data, the number is still below average at 61%. It looks like non-knuckleball pitchers throw fastballs at approximately their normal frequency after eephus pitches, and that R.A. Dickey steers away from the post-eephus fastball almost entirely. Perhaps this means that pitchers already understand that the extra-fast-looking post-eephus fastball is only a myth.
Since the eephus doesn’t appear to be any better than a fastball as an isolated pitch, and we’ve also debunked the theory that a fastball is more deadly when thrown after an eephus, is there any reason to consider using this pitch? Perhaps. Examining the on base percentage (OBP) of plate appearances where the eephus was featured, and comparing this to the OBP of non-eephus plate appearances, we do see a slight decrease when the eephus is used. An eephus-containing atbat sees the batter get on base 30.8% of the time, whereas an average plate appearance has a slightly higher OBP of 31.9%. A difference of more than an entire percentage point is larger than I would have expected here, and suggests that something about this rare pitch may, indeed, work in a pitcher’s favor.
Despite its incredibly slow speed, the eephus pitch manages to hold its own. Batters have trouble making high quality contact with the pitch, and in general get on base less often when the pitch is utilized in a plate appearance. That said, analyzing a rare pitch inevitably means working with small sample sizes, which means that it’s hard to gain many deep insights into this pitch beyond some simple summary stats. A word of caution, however: a pitcher should always be careful not to throw this “surprise” pitch twice in a row, lest they end up like poor Orlando Hernandez.
A quick tutorial on fetching MLB win-loss data with pybaseball and cleaning and visuzlizing it with the tidyverse (dplyr and ggplot).
Tanking becomes a hot topic each season once it becomes apparent which of the NBA’s worst teams will be missing the playoffs. In this post I address the valu...
I’ve been borderline obsessed with the eephus pitch for some time now. Every time I see a player pull this pitch out of their arsenal I become equal parts ex...
For the past three months I have had the exciting opportunity to intern as a data scientist at Major League Baseball Advanced Media, the technology arm of ML...
Throughout my baseball-facing work at MLB Advanced Media, I came to realize that there was no reliable Python tool available for sabermetric research and adv...
A collection of some of my favorite books. Business, popular economics, stats and machine learning, and some literature.
Each cup of coffee I have consumed in the past 5 months has been logged on a spreadsheet. Here’s what I’ve learned by data sciencing my coffee consumption.
Building a Content-Based Recommender System for Books: Using Natural Language Processing to Understand Literary Preference
Literature is a tricky area for data science. Think of your five favorite books. What do they have in common? Some may share an author or genre, but besides ...
Machine Learning and the NFL Field Goal: Using Statistical Learning Techniques to Isolate Placekicker Ability
Probabilistic modeling on NFL field goal data. Applying logistic regression, random forests, and neural networks in R to measure contributing factors of fiel...