On Draft Pick Value, the New Lottery, and Tanking

30 November 2017 on projects

Tanking becomes a hot topic each season once it becomes apparent which of the NBA's worst teams will be missing the playoffs. In this post I address the value of a draft pick and of tanking in the league's end-of-season rankings, with applications to trade valuation and the impact of the league's recently proposed changes to the draft.

A Statcast Tribute to Baseball's Strangest Pitch: the Eephus

14 November 2017 on projects

I've been borderline obsessed with the eephus pitch for some time now. Every time I see a player pull this pitch out of their arsenal I become equal parts excited and bamboozled. Startlingly little research has been done to date on this uncommon pitch, and thus, this post is going to serve as an exploratory analysis of and tribute to the mythical eephus.

Leaving MLB: Lessons Learned in my First Data Science Role

14 August 2017 on data-science

For the past three months I have had the exciting opportunity to work as a data scientist at Major League Baseball Advanced Media, the technology arm of MLB. This post gives an overview of what I've been working on and the advice I would give a fellow first-time data scientist on their first day on the job.

Introducing pybaseball: an Open Source Package for Baseball Data Analysis

27 July 2017 on projects, open-source

Throughout my baseball-facing work at MLB Advanced Media, I came to realize that there was no reliable Python tool available for sabermetric research and advanced baseball statistics. As a response to this, I built pybaseball - a Python package for baseball data analysis.

Summer of Machine Learning

04 June 2017 on summer, ml, reading

Inspired by a similar project by Chris Albon, I am sharing my day-to-day progress on my summer goals for becoming a better data scientist.

338 Cups of Coffee

12 January 2017 on projects, coffee, personal

Each cup of coffee I have consumed in the past 5 months has been logged on a spreadsheet. Here's what I've learned by data sciencing my coffee consumption.

Building a Content-Based Recommender System for Books: Using Natural Language Processing to Understand Literary Preference

01 November 2016 on projects

Literature is a tricky area for data science. Think of your five favorite books. What do they have in common? Some may share an author or genre, but besides that, it is probably hard for you to think of what traits they share. My team and I set out to explore the mysterious components of an individual’s literary taste profile, and in the process built a content-based recommender system for books. This post is a brief overview of the system, the features it uses, and how it was built.


15 May 2016 on personal, reading

A collection of some of my favorite books. Business, popular economics, stats and machine learning, and some literature.

Machine Learning and the NFL Field Goal: Using Statistical Learning Techniques to Isolate Placekicker Ability

10 January 2016 on projects, ML, data, science, field, goal

Probabilistic modeling on NFL field goal data. Applying logistic regression, random forests, and neural networks in R to measure contributing factors of field goal success, and then using this model to rate kickers by posts-added above the exptected value. Published in Elements Research Journal Fall 2016, presented at Boston College Big Data Research Symposium Spring 2015.