James LeDoux

Data scientist and armchair sabermetrician

Bookshelf

May 15, 2017

A running list of what I’ve been reading.

Currently Reading: Bayesian Data Analysis 3

Data Science

Introduction to Statistical Learning — James, Tibshirani, and Hastie

A solid overview of the statistical learning theory that underlies machine learning. Allows a reader to get an intuitive grasp of what is going on inside the “black box”, but is a little too far on qualitative side if one hopes to gain a full understanding. For a deeper dive, see the advanced version Elements of Statistical Learning.

Elements of Statistical Learning — James, Tibshirani, and Hastie

Like the above, but much more dense. Worth suffering through its first several chapters. Builds character.

Learning from Data – Abu-Mostafa and Magdon-Ismail

Another machine learning book that focuses on theory. It won’t show you how to train your own models, but it will help to understand why models work and what guarantees we’re able to make about learning and generalization.

Deep Learning – Goodfellow, Bengio and Courville

A wonderful balance of intuition and theory that the field has been lacking. Begins with the nuts and bolts of feedforward networks, and then goes into depth on the state of the art in model regularization, optimization, and various model classes and architectures. Filled with useful tips and tricks for implementing models.

Deep Learning with Python – Francois Chollet

A handy reference for Keras. This book is helpful for bridging the gap between beginner deep learning tutorials and more advanced / state-of-the-art methods. It’s not helpful for learning theory, but will help you to implement what you read in papers.

The Visual Display of Quantitative Information – Edward Tufte

Communicating what your data have to say with clarity, precision, and efficiency. Its pretty graphics also make it a great coffee table book.

Mostly Harmless Econometrics: An Empiricist’s Companion – Angrist and Pischke

A handbook on advanced econometrics. Useful for brushing up on linear models (simple and multiple linear regression) and experiment design (instrumental variables, difference-in-difference models, answering causal questions.)

Mastering Metrics: The Path from Cause to Effect – Angrist and Pischke

This book is very similar to Mostly Harmless Econometrics, but more beginner-friendly. Get this one instead if you’re learning econometrics for the first time.

This book felt like a greatest hits compliation of all the most useful and exciting things I learned about experiment design as an undergrad. It’s the best book I’ve found to date for marrying the strengths of old-school statisticians and newer-school data scientists.

R for Data Science – Wickham and Grolemund

An absurdly useful book for learning how to manipulate data with R and the Tidyverse (dplyr, ggplot, forcats, etc.) I read this once when I was first learning R and again after a few years of experience and learned new things each time. This book will make anyone better at data analysis, visualization, manipulation, and cleaning.

Python for Data Analysis – Wes McKinney

Python’s closest equivalent to R for Data Science. Useful for understanding Pandas dataframes more deeply, and helped me to rely on stack overflow a lot less.

The Signal and the Noise — Nate Silver

My life would have turned out quite different had it not been for Nate Silver and this book. The Signal and the Noise helped me to discover my love for data science!

The Book of Why – Judea Pearl

I can’t shake the feeling that, as a data scientist, this book hates me. Nonetheless, it opened my eyes to an approach to causal inference that was entirely different than anything I’d been exposed to before.

Analyzing Baseball Data with R – Albert and Marchi

A reference book on sabermetrics with code samples in R. This book is useful to keep around when working with some of the main publicly-available baseball data sources such as Retrosheet and the Lahman database.

The Information – James Gleick

A short, digestible history of and introduction to information theory. It won’t make you an expert, but you’ll get the main ideas.

Decision Science

Misbehaving — Richard Thaler

This book covers the rise of behavioral economics from one of its earliest practitioners. Thaler draws from his experience to give an often Freakonomics-esque run-down of how economic models fail to describe real-world human behavior.

Thinking, Fast and Slow — Daniel Kahneman

A layman’s version of the theories that laid the groundwork for behavioral economics. Kahneman explains the two chief mechanisms in our brains (fast and slow thought), and how they cause predictable biases.

Predictably Irrational – Dan Ariely

This read a lot like Thinking Fast and Slow, but Ariely is a much better writer. Another book about how our brains take shortcuts that lead to irrational decision making.

Algorithms to Live By: the Computer Science of Human Decisions - Christian and Griffiths

A psychology-computer science fusion piece on how fundamental computer science algorithms and data strucutres can aide decision making. A fun way to tie stacks, queues, sorting algorithms into your everyday life.

Freakonomics I & II — Levitt and Dubner

My love for Steven Levitt’s work is second only to that for Nate Silver’s. Freakonomics showed me that the economics tool set can be used to advance causes much greater than economics itself.

What Money Can’t Buy — Michael Sandel

A book on incentives, and how just about everything has a price tag if framed correctly.

When to Rob a Bank and 131 More Warped Suggestions and Well-Intended Rants — Levitt and Dubner

A collection of posts from the Freakonomics blog strung together into a greater narrative. A nice mix of incentive schemes, economic ramblings, and musings on irrational behavior.

Thinking in Bets – Annie Duke

A poker professional’s take on training yourself to think rationally and probabilistically.

Grit - Angela Duckworth

This book came to my attention because people were arguing about it on Twitter. Duckworth’s research attempts to measure people’s level of grittiness. I didn’t find it very useful or interesting.

Misc. Applied Statistics

The Book: Playing the Percentages in Baseball - Tango, Lichtman and Dolphin

I was lucky enough to work with Tango while working as a statistician at MLB. This book is essentially the bible for a modern-day sabermetrician, answering baseball’s most fundamental strategic questions with an empirial approach and interpretable models.

Superforecasting: the Art and Science of Prediction — Phillip Tetlock

An engaging read for statheads and trivia fanatics. Tetlock draws from his experience as the head of the Good Judgment Project (a lengthy study on forecasting) to break down what exactly makes a great forecaster able to see the future better than the rest of us. The key findings are based in psychology and methods of improvement through self-evaluation.

Moneyball — Michael Lewis

The story of how Billy Beane’s Oakland A’s are able to build successful teams in one of baseball’s smallest markets. Lewis’ walk through the logic of sabermetrics and sorting signal from noise in baseball data was eye opening as a stats geek and sports fan alike.

Chasing Perfection - Andy Clockner

A mostly-qualitative run through the current state of basketball analytics, detailing recent phenomena such as the decline of the mid-range jumper, tanking for draft picks, and the specialized medical analyses being used to ensure player longevity.

Business and Economics

The Everything Store: Jeff Bezos and the Age of Amazon - Brad Stone

The rise of Bezos and Amazon. A handbook on long-term thinking and execution in complex environments.

Elon Musk - Ashlee Vance

Musk’s success story seems similar to that of Bezos: defined by an obsessive focus on a small number of long-term goals and a superhuman work ethic.

The Hard Thing About Hard Things — Ben Horowitz

Advice from a VC and former CEO about how to get through the low points as a leader, and how to lead when things are not going well.

How Google Works — Eric Schmidt and Jonathan Rosenberg

How to build and scale an organization where innovation comes natural, told by two of the leaders responsible for doing this at Google.

Zero to One — Peter Thiel

Peter Thiel’s notes on how to start a startup. Advice on market positioning, culture, and overcoming the challenges of early-stage entrepreneurship.

Peddling Prosperity — Paul Krugman

A breakdown of the dangers of journalist and pundit-fed pseudo-economics. The stories of those who “peddle prosperity” in this way are seldom grounded in facts. This book breaks down the rise and farce of Reaganomics, and how to be weary of such false theories in the future.

The Dhando Investor — Mohnish Pabrai

Simple, easy to follow value investing principles from a hedge fund manager.

Work Rules! — Laszlo Bock

An uncharacteristically interesting book about HR from Google’s former HR chief. Ideas on using data for better HR decisions.

The Little Book that Beats the Market — Joel Greenblatt

More value investing, this time greatly simplified. One of the most useful reads for an investor who is not a finance pro.

The Intelligent Investor — Benjamin Graham

The bible for any value investor. Graham’s Mr. Market illustration remains relevant today.

Updated: 2019/08/02

2020 2
2019 1
2018 2
2017 6
2016 2

2020

Multi-Armed Bandits in Python: Epsilon Greedy, UCB1, Bayesian UCB, and EXP3

13 minute read

This post explores four algorithms for solving the multi-armed bandit problem (Epsilon Greedy, EXP3, Bayesian UCB, and UCB1), with implementations in Python ...

Offline Evaluation of Multi-Armed Bandit Algorithms in Python using Replay

9 minute read

Multi-armed bandit algorithms are seeing renewed excitement, but evaluating their performance using a historic dataset is challenging. Here’s how I go about ...

2019

Understanding the AdTech Auctions in Your Browser: an Analysis of 30,000 Prebid.js Auctions

7 minute read

An analysis of auction dynamics in client-side header bidding

2018

Predicting The Shift: Boosting and Bagging for Strategic Infield Positioning

23 minute read

Using machine learning to predict strategic infield positioning using statcast data and contextual feature engineering.

Visualizing MLB Team Rankings with ggplot2 and Bump Charts

3 minute read

A quick tutorial on fetching MLB win-loss data with pybaseball and cleaning and visuzlizing it with the tidyverse (dplyr and ggplot).

2017

On Draft Pick Value, the New Lottery, and Tanking

12 minute read

Tanking becomes a hot topic each season once it becomes apparent which of the NBA’s worst teams will be missing the playoffs. In this post I address the valu...

A Statcast Tribute to Baseball’s Strangest Pitch: the Eephus

7 minute read

I’ve been borderline obsessed with the eephus pitch for some time now. Every time I see a player pull this pitch out of their arsenal I become equal parts ex...

Leaving MLB: Lessons Learned in my First Data Science Role

4 minute read

For the past three months I have had the exciting opportunity to intern as a data scientist at Major League Baseball Advanced Media, the technology arm of ML...

Introducing pybaseball: an Open Source Package for Baseball Data Analysis

2 minute read

Throughout my baseball-facing work at MLB Advanced Media, I came to realize that there was no reliable Python tool available for sabermetric research and adv...

Bookshelf

4 minute read

A collection of some of my favorite books. Business, popular economics, stats and machine learning, and some literature.

338 Cups of Coffee

6 minute read

Each cup of coffee I have consumed in the past 5 months has been logged on a spreadsheet. Here’s what I’ve learned by data sciencing my coffee consumption.

2016

Building a Content-Based Recommender System for Books: Using Natural Language Processing to Understand Literary Preference

4 minute read

Literature is a tricky area for data science. Think of your five favorite books. What do they have in common? Some may share an author or genre, but besides ...

Machine Learning and the NFL Field Goal: Using Statistical Learning Techniques to Isolate Placekicker Ability

4 minute read

Probabilistic modeling on NFL field goal data. Applying logistic regression, random forests, and neural networks in R to measure contributing factors of fiel...

James LeDoux

Bookshelf

Data Science

Introduction to Statistical Learning — James, Tibshirani, and Hastie

Elements of Statistical Learning — James, Tibshirani, and Hastie

Learning from Data – Abu-Mostafa and Magdon-Ismail

Deep Learning – Goodfellow, Bengio and Courville

Deep Learning with Python – Francois Chollet

The Visual Display of Quantitative Information – Edward Tufte

Mostly Harmless Econometrics: An Empiricist’s Companion – Angrist and Pischke

Mastering Metrics: The Path from Cause to Effect – Angrist and Pischke

Bit by Bit: Social Research in the Digital Age – Matthew Salganik

R for Data Science – Wickham and Grolemund

Python for Data Analysis – Wes McKinney

The Signal and the Noise — Nate Silver

The Book of Why – Judea Pearl

Analyzing Baseball Data with R – Albert and Marchi

The Information – James Gleick

Decision Science

Misbehaving — Richard Thaler

Thinking, Fast and Slow — Daniel Kahneman

Predictably Irrational – Dan Ariely

Algorithms to Live By: the Computer Science of Human Decisions - Christian and Griffiths

Freakonomics I & II — Levitt and Dubner

What Money Can’t Buy — Michael Sandel

When to Rob a Bank and 131 More Warped Suggestions and Well-Intended Rants — Levitt and Dubner

Thinking in Bets – Annie Duke

Grit - Angela Duckworth

Misc. Applied Statistics

The Book: Playing the Percentages in Baseball - Tango, Lichtman and Dolphin

Superforecasting: the Art and Science of Prediction — Phillip Tetlock

Moneyball — Michael Lewis

Chasing Perfection - Andy Clockner

Business and Economics

The Everything Store: Jeff Bezos and the Age of Amazon - Brad Stone

Elon Musk - Ashlee Vance

The Hard Thing About Hard Things — Ben Horowitz

How Google Works — Eric Schmidt and Jonathan Rosenberg

Zero to One — Peter Thiel

Peddling Prosperity — Paul Krugman

The Dhando Investor — Mohnish Pabrai

Work Rules! — Laszlo Bock

The Little Book that Beats the Market — Joel Greenblatt

The Intelligent Investor — Benjamin Graham

2020

2019

2018

2017

2016