pandas 解析熊猫中的日期字符串

Question

提问by Zilore Mumba

My sincere apologies for asking a question which has already been comprehensively resolved more than once. My problem is that I have tried to copy-paste, as is, examples from SO posts but noe work for me. I have data in the form "dd mm YYYY HH mm ss value value". I want to read this as a pandas dataframe with the first column as a datestring. Here are the examples I tried from SO posts: 1. first sample data

很抱歉问了一个已经不止一次全面解决的问题。我的问题是，我试图复制粘贴 SO 帖子中的示例，但对我不起作用。我有“dd mm YYYY HH mm ss 值”形式的数据。我想将其作为第一列作为日期字符串的 Pandas 数据框读取。以下是我从 SO 帖子中尝试的示例： 1. 第一个示例数据

01 05 2013 00 00 00     26.4    16.5
02 05 2013 00 00 00     25.9    17.7
03 05 2013 00 00 00     26.6    17.4
......

The zeroes for HH mm ss were added just to conform exactly to the examples.

添加 HH mm ss 的零只是为了与示例完全一致。

Secondly the packages imported
from datetime import datetime import pandas as pd import os from cStringIO import StringIO
fname = os.path.expanduser('./temps0.txt')
Now the three examples I tried
a) from SO post:
Parse dates when year month day and hour are in separate columns using pandas in python
def dt_parse(date_string): dt = pd.datetime.strptime(date_string, '%d %m %Y') return dt df = pd.read_csv(fname, header=None, index_col='datetime', parse_dates={'datetime': [0,1,2,4,5,6]}, date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y')) date_parser=dt_pasre)

其次是导入的包
from datetime import datetime import pandas as pd import os from cStringIO import StringIO
fname = os.path.expanduser('./temps0.txt')
现在我试过的三个例子
a) 来自 SO 帖子：
在python中使用pandas解析年月日和小时在单独的列中的日期
def dt_parse(date_string): dt = pd.datetime.strptime(date_string, '%d %m %Y') return dt df = pd.read_csv(fname, header=None, index_col='datetime', parse_dates={'datetime ': [0,1,2,4,5,6]}, date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y')) date_parser=dt_pasre)

and here is part of the error

这是错误的一部分

File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
 line 1763, in _try_convert_dates
    colnames.append(str(columns[c]))
IndexError: list index out of range

b) Another try, from SO post:

b) 另一个尝试，来自 SO 帖子：

https://stackoverflow.com/questions/11615504/parse-dates-when-yyyymmdd-and-hh-are-in-separate-columns-using-pandas-in-python

parse = lambda x: datetime.strptime(x, '%d%m%Y')
pd.read_table("./temps0.txt",  parse_dates = [['DD MM YYYY HH mm ss']],
                               index_col = 0,
                               date_parser=parse)

and here is part of the error

这是错误的一部分

File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
 line 1009, in _set
    self._reader.set_noconvert(names.index(x))
ValueError: 'DD MM YYYY HH mm ss' is not in list

c) Another try, from SO post:

c) 另一个尝试，来自 SO 帖子：

https://stackoverflow.com/questions/17301589/parsing-dd-mm-yy-hh-mm-ss-columns-from-txt-file-using-pythons-pandas?lq=1
def date_parser(ss):
    day, month, year, hour, min, sec = ss.split()
    return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))
df = pd.read_csv('temps0.txt', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

and here is part of the error

这是错误的一部分

File "<stdin>", line 2, in date_parser
ValueError: too many values to unpack

Please excuse my ignorance as am trying to learn mostly through SO posts (the official documentation lacking completely in examples for clarity).

请原谅我的无知，因为我主要是通过 SO 帖子来学习（为了清楚起见，官方文档中完全没有示例）。

Answer 1

回答by Andy Hayden

In Chang's answerhe used a parser, which is the bit you need to tweak to match the date format of your date strings:

在Chang 的回答中，他使用了一个解析器，这是您需要调整以匹配日期字符串的日期格式的部分：

import datetime
parse = lambda x: datetime.strptime(x, '%d %m %Y %H %M %S')

You also need to tweak the column names, which in that example were similar to the format, which was a bit confusing. Here we have no columns names so we can use numbers (for the position of the column, and tell read_csvthat using header=None):

您还需要调整列名，在该示例中与格式类似，这有点令人困惑。这里我们没有列名，所以我们可以使用数字（对于列的位置，并告诉read_csv使用 header=None）：

from StringIO import StringIO
csv = '''01 05 2013 00 00 00,26.4,16.5
02 05 2013 00 00 00,25.9,17.7
03 05 2013 00 00 00,26.6,17.4'''

Note: we use header=None, since there are no column names.

注意：我们使用 header=None，因为没有列名。

df = pd.read_csv(StringIO(csv), parse_dates=[0],
                                index_col=0,
                                date_parser=parse,
                                header=None)
In [11]: df
Out[11]:
               1     2
0                     
2013-05-01  26.4  16.5
2013-05-02  25.9  17.7
2013-05-03  26.6  17.4

If your data is separated by multiple spaces (more than two) rather than commas, then use the sep argument:

如果您的数据由多个空格（两个以上）而不是逗号分隔，则使用 sep 参数：

csv = '''01 05 2013 00 00 00     26.4    16.5
02 05 2013 00 00 00     25.9    17.7
03 05 2013 00 00 00     26.6    17.4'''

In [21]: pd.read_csv(StringIO(csv), parse_dates = [0],
                           index_col = 0,
                           date_parser=parse,
                           header=None, sep='\s\s+')
Out[21]: 
               1     2
0                     
2013-05-01  26.4  16.5
2013-05-02  25.9  17.7
2013-05-03  26.6  17.4

Note: the io docsgo into a lot of detail, with many examples.

注意：io 文档有很多细节，有很多例子。

pandas 解析熊猫中的日期字符串

提问by Zilore Mumba

回答by Andy Hayden

相关推荐

最近更新

标签

pandas 解析熊猫中的日期字符串

提问by Zilore Mumba

回答by Andy Hayden

相关推荐

python pandas：将带参数的函数应用于系列。更新

遍历 Dataframes Pandas 列表

pandas python pandas没有连接到空的DataFrame

pandas.merge：匹配最近的时间戳>=时间戳系列

相关推荐

最近更新

标签