Turn Date Strings into Datetime Objects

June 01, 2019

This post shows how to properly format a date string and turn it into a datetime object using Pandas’ pd.to_datetime function.

Background

Date values in raw data will usually be read into a dataframe as a string type. To use these values for date math or perform any sort of timeseries analysis, these will need to be converted into a proper datetime format that Python understands. There are a few ways to turn date strings into datetimes. Pandas’ to_datetime funtion is one fairly easy way to do it.

Imports

import pandas as pd

Create Some Example Date Strings

datetimes = ['2018-10-19 23:28:40.798061', '2018-10-18 11:10:44.453098', '2018-10-20 01:10:01.759478']
dates = ['2018-01-01', '2018-04-22', '2018-09-09']
more_dates = ['August 1, 2018', 'January 10, 2016', 'September 26, 1988']

Most of the time, pd.to_datetime will be able to infer the correct date format without any help. If it’s unable to do this, you will need to specify the datestring’s format. strftime.org is a good point of reference for creating strftime format. I’ve also had some luck using strftimer.com as a starting point, where you can paste in a date and the site returns its strftime format. The value reaturned is based on Ruby’s format, but it’s about the same as Python’s.

It’s useful to start with the default setting of errors='raise', so that you’re aware of any deviations from the expected format. If there are a few rogue values that can’t be fixed, errors='coerce' will coerce these entries to NaTs.

pd.to_datetime(datetimes, format='%Y-%m-%d %H:%M:%S.%f', errors='coerce')
pd.to_datetime(dates, format='%Y-%m-%d', errors='coerce')
pd.to_datetime(more_dates, format='%B %d, %Y', errors='coerce')

DatetimeIndex(['2018-10-19 23:28:40.798061', '2018-10-18 11:10:44.453098', '2018-10-20 01:10:01.759478'], dtype='datetime64[ns]', freq=None)

DatetimeIndex(['2018-01-01', '2018-04-22', '2018-09-09'], dtype='datetime64[ns]', freq=None)

DatetimeIndex(['2018-08-01', '2016-01-10', '1988-09-26'], dtype='datetime64[ns]', freq=None)