如何让 pandas.read_csv() 从 CSV 文件列推断日期时间和时间增量类型？

Question

提问by Sean Hammond

pandas.read_csv()infers the types of columns, but I can't get it to infer any datetime or timedelta type (e.g. datetime64, timedelta64) for columns whose values seem like obvious datetimes and time deltas.

pandas.read_csv()推断列的类型，但我无法让它为值看起来像明显的日期时间和时间增量的列推断任何日期时间或时间增量类型（例如datetime64，timedelta64）。

Here's an example CSV file:

这是一个示例 CSV 文件：

datetime,timedelta,integer,number,boolean,string
20111230 00:00:00,one hour,10,1.6,True,Foobar

And some code to read it with pandas:

以及一些用Pandas读取它的代码：

dataframe = pandas.read_csv(path)

The types of the columns on that dataframe come out as object, object, int, float, bool, object. They're all as I would expect except the first two columns, which I want to be datetime and timedelta.

该数据帧上的列的类型以 object、object、int、float、bool、object 的形式出现。除了前两列，我希望它们是 datetime 和 timedelta，它们都和我所期望的一样。

Is it possible to get pandas to automatically detect datetime and timedelta columns?

是否可以让 Pandas 自动检测 datetime 和 timedelta 列？

(I don't want to have to tell pandas which columns are datetimes and timedeltas or tell it the formats, I want it to try and detect them automatically like it does for into, float and bool columns.)

（我不想告诉 Pandas 哪些列是日期时间和时间增量，或者告诉它格式，我希望它像处理 into、float 和 bool 列一样尝试自动检测它们。）

Answer 1

回答by EdChum

One thing you can do is define your date parser using strptime, this will handle your date format, this isn't automatic though:

您可以做的一件事是使用定义日期解析器strptime，这将处理您的日期格式，但这不是自动的：

In [59]:

import pandas as pd
import datetime as dt

def parse_dates(x):
    return dt.datetime.strptime(x, '%Y%m%d %H:%M:%S')
# dict for word lookup, conversion
word_to_int={'zero':0,
     'one':1,
     'two':2,
     'three':3,
     'four':4,
     'five':5,
     'six':6,
     'seven':7,
     'eight':8,
     'nine':9}


def str_to_time_delta(x):
    num = 0
    if 'hour' in x.lower():
        num = x[0:x.find(' ')].lower()
    return dt.timedelta( hours = word_to_int[num])
df = pd.read_csv(r'c:\temp1.txt', parse_dates=[0],date_parser=parse_dates)
df.dtypes
Out[59]:
datetime     datetime64[ns]
timedelta            object
integer               int64
number              float64
boolean                bool
string               object
dtype: object
In [60]:

Then to convert to timedeltas use the dict and function to parse and convert to timedeltas

然后转换为 timedeltas 使用 dict 和函数来解析并转换为 timedeltas

df['timedelta'] = df['timedelta'].map(str_to_time_delta)

In [61]:

df.dtypes
Out[61]:
datetime      datetime64[ns]
timedelta    timedelta64[ns]
integer                int64
number               float64
boolean                 bool
string                object
dtype: object
In [62]:

df
Out[62]:
             datetime  timedelta  integer  number boolean  string
0 2011-12-30 00:00:00   01:00:00       10     1.6    True  Foobar

[1 rows x 6 columns]

To answer your principal question I don't know of a way to automatically do this.

要回答您的主要问题，我不知道自动执行此操作的方法。

EDIT

编辑

Instead of my convoluted mapping function you can do just this:

您可以这样做，而不是我复杂的映射函数：

df['timedelta'] = pd.to_timedelta(df['timedelta'])

Further edit

进一步编辑

As noted by @Jeff you can do this instead of using strptimewhen reading the csv (in version 0.13.1 and above though):

正如@Jeff 所指出的，您可以strptime在读取 csv 时执行此操作而不是使用（尽管在 0.13.1 及更高版本中）：

df = pd.read_csv(r'c:\temp1.txt', parse_dates=[0], infer_datetime_format=True)

如何让 pandas.read_csv() 从 CSV 文件列推断日期时间和时间增量类型？

提问by Sean Hammond

回答by EdChum

相关推荐

最近更新

标签

如何让 pandas.read_csv() 从 CSV 文件列推断日期时间和时间增量类型？

提问by Sean Hammond

回答by EdChum

相关推荐

检查 Pandas 列是否包含来自另一列的值

pandas 解决重新索引仅对唯一值的索引对象有效

根据 Pandas / matplotlib 中的类绘制直方图

Python pandas 删除 SettingWithCopyWarning

相关推荐

最近更新

标签