无法从 read_csv 在 Pandas 数据框中索引日期
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21066464/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cannot index date in Pandas Data Frame from read_csv
提问by prre72
I came across a problem today that I unable to solve. I read a csv file using
我今天遇到了一个我无法解决的问题。我使用读取了一个 csv 文件
mydata = pd.read_csv(file_name, header=0, sep=",", index_col=[0], parse_dates=True)
the CSV looks like:
CSV 看起来像:
2009-12-10,5,6,7,8,9
2009-12-11,7,6,6,7,9
instead of getting an indexed dataframe i get the following output
我得到以下输出,而不是获得索引数据帧
print mydata
Empty DataFrame
Columns: []
Index: [2009-12-10,5,6,7,8,9 2009-12-11,7,6,6,7,9]
Please help!! I have been trying for 2 hours now!
请帮忙!!我已经尝试了2个小时了!
Many thanks
非常感谢
回答by hernamesbarbara
I think your code works. Here's what I see:
我认为你的代码有效。这是我所看到的:
The data:
数据:
import pandas as pd
data = """2009-12-10,5,6,7,8,9
2009-12-11,7,6,6,7,9"""
Read the data from the csv.
从 csv 中读取数据。
ts = pd.read_csv(pd.io.parsers.StringIO(data),
names=['timepoint', 'a','b','c','d','e'],
parse_dates=True,
index_col=0)
That looks like this
看起来像这样
In [59]: ts
Out[59]:
a b c d e
timepoint
2009-12-10 5 6 7 8 9
2009-12-11 7 6 6 7 9
And the index is a time series
并且指数是一个时间序列
In [60]: ts.index
Out[60]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2009-12-10 00:00:00, 2009-12-11 00:00:00]
Length: 2, Freq: None, Timezone: None
Can you give this a try and post an update if you get different results?
如果您得到不同的结果,您可以尝试一下并发布更新吗?
UPDATE:In response to @prre72's comment regarding column headers in the csv file:
更新:回应@prre72 关于 csv 文件中列标题的评论:
If the csv has 5 column headers with the index column being unlabeled, you can do this:
如果 csv 有 5 个列标题且索引列未标记,则可以执行以下操作:
In [17]:
data = """"a","b","c","d","e"
2009-12-10,5,6,7,8,9
2009-12-11,7,6,6,7,9"""
ts = pd.read_csv(pd.io.parsers.StringIO(data),
parse_dates=True,
index_col=0)
In [18]: ts
Out[18]:
a b c d e
2009-12-10 5 6 7 8 9
2009-12-11 7 6 6 7 9
In [19]: ts.index
Out[19]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2009-12-10 00:00:00, 2009-12-11 00:00:00]
Length: 2, Freq: None, Timezone: None
回答by Yeqing Zhang
You need to use parse_dates=[0]to specify the date columns you want to parse. You don't have to sepcify header=0. Use header=Noneinstead, which won't force you specifying headers. Try this:
您需要使用parse_dates=[0]来指定要解析的日期列。你不必 sepcify header=0。header=None改为使用,这不会强制您指定标题。尝试这个:
mydata = pd.read_csv(file_name, header=None, sep=",", index_col=[0],
parse_dates=[0])
print mydata
1 2 3 4 5
0
2009-12-10 5 6 7 8 9
2009-12-11 7 6 6 7 9
If you want to specify column names, just use this:
如果要指定列名,只需使用以下命令:
mydata.columns = list("abcde") # list of column names
回答by Vaibhav Taneja
import pandas as pd
raw_dt = pd.read_csv("fileName.csv", import_dates = True, index_col = 0)
raw_dt
Now, when you execute this code, index_col = 0will treat the first column from your file as the index column and import_dates = Truewill parse columns containing dates in your file to date type.
现在,当您执行此代码时,index_col = 0会将文件中的第一列视为索引列,import_dates = True并将文件中包含日期的列解析为日期类型。

