当日期和时间在单独的列中时,将数据从 csv 读取到 Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17492923/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading data from csv into pandas when date and time are in separate columns
提问by seaotternerd
I looked at the answer to this question: Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python, but it doesn't seem to work for me, which makes me think I'm doing something subtley wrong.
我查看了这个问题的答案:Parse date when YYYYMMDD and HH are indifferent columns using pandas in Python,但它似乎对我不起作用,这让我觉得我在做一些微妙的错误。
I've got data in .csv files, which I'm trying to read using the pandas read_csv function. Date and time are in two separate columns, but I want to merge in them into one column, "Datetime", containing datetime objects. The csv looks like this:
我有 .csv 文件中的数据,我正在尝试使用 pandas read_csv 函数读取这些数据。日期和时间在两个单独的列中,但我想将它们合并到一列“日期时间”中,其中包含日期时间对象。csv 如下所示:
Note about the data
blank line
Site Id,Date,Time,WTEQ.I-1...
2069, 2008-01-19, 06:00, -99.9...
2069, 2008-01-19, 07:00, -99.9...
...
I'm trying to read it using this line of code:
我正在尝试使用以下代码行阅读它:
read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, date_parser=True, na_values=["-99.9"])
However, when I write it back out to a csv, it looks exactly the same (except that the -99.9s are changed to NA, like I specified with the na_values argument). Date and time are in two separate columns. As I understand it, this should be creating a new column called Datetime that is composed of columns 1 and 2, parsed using the date_parser. I have also tried using parse_dates={"Datetime" : ["Date","Time"]}, parse_dates=[[1,2]], and parse_dates=[["Date", "Time"]]. I have also tried using date_parser=parse, where parse is defined as:
但是,当我将它写回 csv 时,它看起来完全一样(除了 -99.9s 更改为 NA,就像我用 na_values 参数指定的那样)。日期和时间在两个单独的列中。据我了解,这应该创建一个名为 Datetime 的新列,该列由第 1 列和第 2 列组成,使用 date_parser 进行解析。我也尝试过使用 parse_dates={"Datetime" : ["Date","Time"]}、parse_dates=[[1,2]] 和 parse_dates=[["Date", "Time"]]。我也尝试过使用 date_parser=parse,其中 parse 定义为:
parse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M')
None of these has made the least bit of difference, which makes me suspect that there's some deeper problem. Any insight into what it might be?
这些都没有产生丝毫差异,这让我怀疑存在一些更深层次的问题。任何洞察它可能是什么?
采纳答案by Andy Hayden
You should update your pandas, I recommend the latest stable versionfor the latest features and bug fixes.
你应该更新你的Pandas,我推荐最新的稳定版本以获得最新的功能和错误修复。
This specific feature was introduced in 0.8.0, and works on pandas version 0.11:
此特定功能是在 0.8.0中引入的,适用于 Pandas 0.11 版:
In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, na_values=["-99.9"])
Out[11]:
Datetime Site Id WTEQ.I-1
0 2008-01-19 06:00:00 2069 NaN
1 2008-01-19 07:00:00 2069 NaN
without the date_parser=True(since this should be a parsingfunction, see docstring).
没有date_parser=True(因为这应该是一个解析函数,请参阅docstring)。
Note that in the provided example the resulting "Datetime" column is a Series of its own and not the index values of the DataFrame. If you'd rather want to have the datetime values as index column rather than the integer value pass the index_col argument specifying the desired column, in this case 0 since the resulting "Datetime" column is the first one.
请注意,在提供的示例中,生成的“日期时间”列是它自己的系列,而不是 DataFrame 的索引值。如果您希望将日期时间值作为索引列而不是整数值,则传递指定所需列的 index_col 参数,在这种情况下为 0,因为生成的“日期时间”列是第一个。
In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, index_col=0, na_values=["-99.9"])

