pandas 解析熊猫中的日期字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21414885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:38:14  来源:igfitidea点击:

parsing datestring in pandas

pythonpandas

提问by Zilore Mumba

My sincere apologies for asking a question which has already been comprehensively resolved more than once. My problem is that I have tried to copy-paste, as is, examples from SO posts but noe work for me. I have data in the form "dd mm YYYY HH mm ss value value". I want to read this as a pandas dataframe with the first column as a datestring. Here are the examples I tried from SO posts: 1. first sample data

很抱歉问了一个已经不止一次全面解决的问题。我的问题是,我试图复制粘贴 SO 帖子中的示例,但对我不起作用。我有“dd mm YYYY HH mm ss 值”形式的数据。我想将其作为第一列作为日期字符串的 Pandas 数据框读取。以下是我从 SO 帖子中尝试的示例: 1. 第一个示例数据

01 05 2013 00 00 00     26.4    16.5
02 05 2013 00 00 00     25.9    17.7
03 05 2013 00 00 00     26.6    17.4
......

The zeroes for HH mm ss were added just to conform exactly to the examples.

添加 HH mm ss 的零只是为了与示例完全一致。

  1. Secondly the packages imported

    from datetime import datetime import pandas as pd import os from cStringIO import StringIO

    fname = os.path.expanduser('./temps0.txt')

  2. Now the three examples I tried

    a) from SO post:

    Parse dates when year month day and hour are in separate columns using pandas in python

    def dt_parse(date_string): dt = pd.datetime.strptime(date_string, '%d %m %Y') return dt df = pd.read_csv(fname, header=None, index_col='datetime', parse_dates={'datetime': [0,1,2,4,5,6]}, date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y')) date_parser=dt_pasre)

  1. 其次是导入的包

    from datetime import datetime import pandas as pd import os from cStringIO import StringIO

    fname = os.path.expanduser('./temps0.txt')

  2. 现在我试过的三个例子

    a) 来自 SO 帖子:

    在python中使用pandas解析年月日和小时在单独的列中的日期

    def dt_parse(date_string): dt = pd.datetime.strptime(date_string, '%d %m %Y') return dt df = pd.read_csv(fname, header=None, index_col='datetime', parse_dates={'datetime ': [0,1,2,4,5,6]}, date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y')) date_parser=dt_pasre)

and here is part of the error

这是错误的一部分

File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
 line 1763, in _try_convert_dates
    colnames.append(str(columns[c]))
IndexError: list index out of range

b) Another try, from SO post:

b) 另一个尝试,来自 SO 帖子:

https://stackoverflow.com/questions/11615504/parse-dates-when-yyyymmdd-and-hh-are-in-separate-columns-using-pandas-in-python

parse = lambda x: datetime.strptime(x, '%d%m%Y')
pd.read_table("./temps0.txt",  parse_dates = [['DD MM YYYY HH mm ss']],
                               index_col = 0,
                               date_parser=parse)

and here is part of the error

这是错误的一部分

File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
 line 1009, in _set
    self._reader.set_noconvert(names.index(x))
ValueError: 'DD MM YYYY HH mm ss' is not in list

c) Another try, from SO post:

c) 另一个尝试,来自 SO 帖子:

https://stackoverflow.com/questions/17301589/parsing-dd-mm-yy-hh-mm-ss-columns-from-txt-file-using-pythons-pandas?lq=1
def date_parser(ss):
    day, month, year, hour, min, sec = ss.split()
    return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))
df = pd.read_csv('temps0.txt', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

and here is part of the error

这是错误的一部分

File "<stdin>", line 2, in date_parser
ValueError: too many values to unpack

Please excuse my ignorance as am trying to learn mostly through SO posts (the official documentation lacking completely in examples for clarity).

请原谅我的无知,因为我主要是通过 SO 帖子来学习(为了清楚起见,官方文档中完全没有示例)。

回答by Andy Hayden

In Chang's answerhe used a parser, which is the bit you need to tweak to match the date format of your date strings:

Chang 的回答中,他使用了一个解析器,这是您需要调整以匹配日期字符串的日期格式的部分:

import datetime
parse = lambda x: datetime.strptime(x, '%d %m %Y %H %M %S')

You also need to tweak the column names, which in that example were similar to the format, which was a bit confusing. Here we have no columns names so we can use numbers (for the position of the column, and tell read_csvthat using header=None):

您还需要调整列名,在该示例中与格式类似,这有点令人困惑。这里我们没有列名,所以我们可以使用数字(对于列的位置,并告诉read_csv使用 header=None):

from StringIO import StringIO
csv = '''01 05 2013 00 00 00,26.4,16.5
02 05 2013 00 00 00,25.9,17.7
03 05 2013 00 00 00,26.6,17.4'''

Note: we use header=None, since there are no column names.

注意:我们使用 header=None,因为没有列名。

df = pd.read_csv(StringIO(csv), parse_dates=[0],
                                index_col=0,
                                date_parser=parse,
                                header=None)
In [11]: df
Out[11]:
               1     2
0                     
2013-05-01  26.4  16.5
2013-05-02  25.9  17.7
2013-05-03  26.6  17.4

If your data is separated by multiple spaces (more than two) rather than commas, then use the sep argument:

如果您的数据由多个空格(两个以上)而不是逗号分隔,则使用 sep 参数:

csv = '''01 05 2013 00 00 00     26.4    16.5
02 05 2013 00 00 00     25.9    17.7
03 05 2013 00 00 00     26.6    17.4'''

In [21]: pd.read_csv(StringIO(csv), parse_dates = [0],
                           index_col = 0,
                           date_parser=parse,
                           header=None, sep='\s\s+')
Out[21]: 
               1     2
0                     
2013-05-01  26.4  16.5
2013-05-02  25.9  17.7
2013-05-03  26.6  17.4

Note: the io docsgo into a lot of detail, with many examples.

注意:io 文档有很多细节,有很多例子。