Get the Difference Between Two Dates

June 01, 2019

Simple date math can create useful features for machine learning models and analyses alike. This post shows how to get the number of days between two dates.

Imports

import pandas as pd

Create Some Example Data

Say we have a dataset of ad campaigns. Each campaign has a start date, and one row for each day the campaign was live.

data = {'campaign_id': [123, 123, 123, 123, 124, 124],
		'campaign_start_date': ['2018-01-01', '2018-01-01', '2018-01-01', '2018-01-01', '2018-02-11', '2018-02-11'],
		'date': ['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-02-11', '2018-02-12'],
		'impressions': [1000, 1113, 1755, 3511, 412, 988]}
df = pd.DataFrame(data)
print(df)
campaign_id campaign_start_date date impressions
123 2018-01-01 2018-01-01 1000
123 2018-01-01 2018-01-02 1113
123 2018-01-01 2018-01-03 1755
123 2018-01-01 2018-01-04 3511
124 2018-02-11 2018-02-11 412
124 2018-02-11 2018-02-12 988

To work with this type of data, you may want to know how many days it’s been since the campaign first went live at each point in time (date - campaign_start_date). Subtracting two columns of type string will raise an error, so there are some additional steps to follow to get the difference between two date columns.

First, make sure the two date columns are of type datetime. See here for more detail on this.

df['campaign_start_date'] = pd.to_datetime(df['campaign_start_date'])
df['date'] = pd.to_datetime(df['date'])

With the two columns as type datetime, you can now subtract them. To get the subtracted value in a usable numeric format, you will need to specify its unit of time as being in days with dt.days.

df['days_since_start'] = (df['date'] - df['campaign_start_date']).dt.days
print(df)
campaign_id campaign_start_date date impressions days_since_start
123 2018-01-01 2018-01-01 1000 0
123 2018-01-01 2018-01-02 1113 1
123 2018-01-01 2018-01-03 1755 2
123 2018-01-01 2018-01-04 3511 3
124 2018-02-11 2018-02-11 412 0
124 2018-02-11 2018-02-12 988 1

As there’s no common definition for the number of days in a month or year, there’s no option to specify dt.months or dt.years. You can approximate this, however, by dividing the output by 30 or 365.