当日期和时间在单独的列中时，将数据从 csv 读取到 Pandas

Question

提问by seaotternerd

I looked at the answer to this question: Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python, but it doesn't seem to work for me, which makes me think I'm doing something subtley wrong.

我查看了这个问题的答案：Parse date when YYYYMMDD and HH are indifferent columns using pandas in Python，但它似乎对我不起作用，这让我觉得我在做一些微妙的错误。

I've got data in .csv files, which I'm trying to read using the pandas read_csv function. Date and time are in two separate columns, but I want to merge in them into one column, "Datetime", containing datetime objects. The csv looks like this:

我有 .csv 文件中的数据，我正在尝试使用 pandas read_csv 函数读取这些数据。日期和时间在两个单独的列中，但我想将它们合并到一列“日期时间”中，其中包含日期时间对象。csv 如下所示：

    Note about the data
    blank line
    Site Id,Date,Time,WTEQ.I-1...
    2069, 2008-01-19, 06:00, -99.9...
    2069, 2008-01-19, 07:00, -99.9...
    ...

I'm trying to read it using this line of code:

我正在尝试使用以下代码行阅读它：

   read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, date_parser=True, na_values=["-99.9"])

However, when I write it back out to a csv, it looks exactly the same (except that the -99.9s are changed to NA, like I specified with the na_values argument). Date and time are in two separate columns. As I understand it, this should be creating a new column called Datetime that is composed of columns 1 and 2, parsed using the date_parser. I have also tried using parse_dates={"Datetime" : ["Date","Time"]}, parse_dates=[[1,2]], and parse_dates=[["Date", "Time"]]. I have also tried using date_parser=parse, where parse is defined as:

但是，当我将它写回 csv 时，它看起来完全一样（除了 -99.9s 更改为 NA，就像我用 na_values 参数指定的那样）。日期和时间在两个单独的列中。据我了解，这应该创建一个名为 Datetime 的新列，该列由第 1 列和第 2 列组成，使用 date_parser 进行解析。我也尝试过使用 parse_dates={"Datetime" : ["Date","Time"]}、parse_dates=[[1,2]] 和 parse_dates=[["Date", "Time"]]。我也尝试过使用 date_parser=parse，其中 parse 定义为：

    parse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M')

None of these has made the least bit of difference, which makes me suspect that there's some deeper problem. Any insight into what it might be?

这些都没有产生丝毫差异，这让我怀疑存在一些更深层次的问题。任何洞察它可能是什么？

Answer 1

采纳答案by Andy Hayden

You should update your pandas, I recommend the latest stable versionfor the latest features and bug fixes.

你应该更新你的Pandas，我推荐最新的稳定版本以获得最新的功能和错误修复。

This specific feature was introduced in 0.8.0, and works on pandas version 0.11:

此特定功能是在 0.8.0中引入的，适用于 Pandas 0.11 版：

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, na_values=["-99.9"])
Out[11]:
             Datetime  Site Id  WTEQ.I-1
0 2008-01-19 06:00:00     2069       NaN
1 2008-01-19 07:00:00     2069       NaN

without the date_parser=True(since this should be a parsingfunction, see docstring).

没有date_parser=True（因为这应该是一个解析函数，请参阅docstring）。

Note that in the provided example the resulting "Datetime" column is a Series of its own and not the index values of the DataFrame. If you'd rather want to have the datetime values as index column rather than the integer value pass the index_col argument specifying the desired column, in this case 0 since the resulting "Datetime" column is the first one.

请注意，在提供的示例中，生成的“日期时间”列是它自己的系列，而不是 DataFrame 的索引值。如果您希望将日期时间值作为索引列而不是整数值，则传递指定所需列的 index_col 参数，在这种情况下为 0，因为生成的“日期时间”列是第一个。

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, index_col=0, na_values=["-99.9"])

当日期和时间在单独的列中时，将数据从 csv 读取到 Pandas

提问by seaotternerd

采纳答案by Andy Hayden

相关推荐

最近更新

标签

当日期和时间在单独的列中时，将数据从 csv 读取到 Pandas

提问by seaotternerd

采纳答案by Andy Hayden

相关推荐

pandas 在 matplotlib 中的刻度线之间居中 x-tick 标签

pandas 更改 DateTimeIndex 的日期

跨行 Pandas Dataframe 求和

pandas 按时间计算 DataFrame 的 EWMA

相关推荐

最近更新

标签