在 Pandas 中从 csv 解析日期时间不会产生 DateTimeIndex

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19590659/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:16:39  来源:igfitidea点击:

Parsing datetime from csv in pandas does not yield DateTimeIndex

pythoncsvpandas

提问by EmEs

I'm exploring Pandas - trying to learn and apply it. Currently I have a csv file populated with a financial timeseries data of following structure:

我正在探索 Pandas - 尝试学习和应用它。目前我有一个 csv 文件,其中填充了以下结构的金融时间序列数据:

date, time, open, high, low, close, volume 2003.04.08,12:00,1.06830,1.06960,1.06670,1.06690,446 2003.04.08,13:00,1.06700,1.06810,1.06570,1.06630,433 2003.04.08,14:00,1.06650,1.06810,1.06510,1.06670,473 2003.04.08,15:00,1.06670,1.06890,1.06630,1.06850,556 2003.04.08,16:00,1.06840,1.07050,1.06610,1.06680,615

date, time, open, high, low, close, volume 2003.04.08,12:00,1.06830,1.06960,1.06670,1.06690,446 2003.04.08,13:00,1.06700,1.06810,1.06570,1.06630,433 2003.04.08,14:00,1.06650,1.06810,1.06510,1.06670,473 2003.04.08,15:00,1.06670,1.06890,1.06630,1.06850,556 2003.04.08,16:00,1.06840,1.07050,1.06610,1.06680,615

Now I want to convert the csv data into a pandas DataFrame object, so that date and time fields merge and become the DateTimeIndex of the DataFrame like this:

现在我想将 csv 数据转换为 pandas DataFrame 对象,以便日期和时间字段合并并成为 DataFrame 的 DateTimeIndex ,如下所示:

df = pa.read_csv(path,
                 names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
                 parse_dates = {'dateTime': ['date', 'time']},  
                 index_col = 'dateTime')

This works yielding a nice DataFrame object:

这可以产生一个很好的 DataFrame 对象:

<class 'pandas.core.frame.DataFrame'>
Index: 8676 entries, 2003.04.08 12:00 to nan nan
Data columns (total 5 columns):
open     8675  non-null values
high     8675  non-null values
low      8675  non-null values
close    8675  non-null values
vol      8675  non-null values
dtypes: float64(5)

But upon inspection it turns out that the Index is not a DataTimeIndex but unicode strings instead:

但经过检查,结果发现 Index 不是 DataTimeIndex 而是 unicode 字符串:

type(df.index)
>>> pandas.core.index.Index
df.index
>>> Index([u'2003.04.08 12:00', u'2003.04.08 13:00', u'2003.04.08 14:00', ....

So read_csvparsed the date and time fields, merged them but did not create a DateTimeIndex. As far as I understood from the documentationa new datastructure object supplied with a list of datetime objects should automatically create a DateTimeIndex. Am I wrong? Is the DataFrame object an exception?

因此read_csv解析日期和时间字段,合并它们但没有创建 DateTimeIndex。据我从文档中了解到,一个带有日期时间对象列表的新数据结构对象应该自动创建一个 DateTimeIndex。我错了吗?DataFrame 对象是一个例外吗?

I also tried to convert the current index like this:

我还尝试像这样转换当前索引:

df.index = pa.to_datetime(df.index)

but no changes have been made to the index and it is still in unicode format. I begin to suspect the default parsing functions aren't doing their job, but I don't get any error messages from them.

但是索引没有改变,它仍然是unicode格式。我开始怀疑默认的解析函数没有完成它们的工作,但我没有从它们那里得到任何错误消息。

How to get a working DateTimeIndex in a DateFrame in this situation?

在这种情况下,如何在 DateFrame 中获得有效的 DateTimeIndex?

Solution:

解决方案:

df = pa.read_csv(path,
                 names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
                 parse_dates={'datetime':['date','time']},
                 keep_date_col = True, 
                 index_col='datetime'
             )

now apply the lambda function, doing what the parser should have done:

现在应用 lambda 函数,做解析器应该做的事情:

df['datetime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)

回答by EdChum

Dateutil is unable to parse your data correctly but you can do it after loading like so using strptime:

Dateutil 无法正确解析您的数据,但您可以在加载后使用strptime以下方法进行解析:

import datetime
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)

This will yield you the 'DateTime' column as datetime64[ns]and you can use it as your index

这将为您生成“DateTime”列datetime64[ns],您可以将其用作索引

EDIT

编辑

Hmm.. interestingly when I do this it works:

嗯..有趣的是,当我这样做时,它起作用了:

df = pd.read_csv(r'c:\data\temp.txt', parse_dates={'datetime':['date','time']}, index_col='datetime')

Could you see what happens when you drop the column names from the parameters to read_csv

你能看到当你从参数中删除列名时会发生什么吗? read_csv