Python 如何使用字符串访问 Pandas DataFrame 日期时间索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36871188/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:28:31  来源:igfitidea点击:

How to access pandas DataFrame datetime index using strings

pythonpandas

提问by Pedro Braz

This is a very simple and practical question. I have the feeling that it must be a silly detail and that there should be similar questions. I wasn't able to find them tho. If someone does I'll happily delete this one.

这是一个非常简单实用的问题。我有一种感觉,这一定是一个愚蠢的细节,应该有类似的问题。我没能找到他们。如果有人这样做,我会很乐意删除这个。

The closest I found were these: pandas: iterating over DataFrame index with loc

我找到的最接近的是这些: pandas: iterating over DataFrame index with loc

How to select rows within a pandas dataframe based on time only when index is date and time

仅当索引为日期和时间时,如何根据时间选择熊猫数据框中的行

anyway, the thing is, I have a datetime indexed panda dataframe as follows:

无论如何,问题是,我有一个日期时间索引的熊猫数据框,如下所示:

In[81]: y
Out[81]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0

In[82]: y.index
Out[82]: DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03'], dtype='datetime64[ns]', freq=None)

Oddly enough, I can't access its values using none of the following methods:

奇怪的是,我无法使用以下任何方法访问其值:

In[83]: y[datetime.datetime(2008,1,1)]
In[84]: y['2008-1-1']
In[85]: y['1/1/2008']

I get the KeyErrorerror.

我得到了KeyError错误。

Even more weird is that the following methods DO work:

更奇怪的是以下方法确实有效:

In[86]: y['2008']
Out[86]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0
In[87]: y['2008-1']
Out[87]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0

I'm fairly new to pandas so maybe I'm missing something here?

我对大熊猫相当陌生,所以也许我在这里遗漏了一些东西?

回答by piRSquared

pandas is taking what's inside the []and deciding what it should do. If it's a subset of column names, it'll return a DataFrame with those columns. If it's a range of index values, it'll return a subset of those rows. What is does not handle is taking a single index value.

pandas 正在获取里面的内容[]并决定它应该做什么。如果它是列名的子集,它将返回一个包含这些列的 DataFrame。如果它是一系列索引值,它将返回这些行的子集。不处理的是采用单个索引值。

Solution

解决方案

Two work around's

两个解决方法

1.Turn the argument into something pandas interprets as a range.

1. 将参数转换为 pandas 解释为范围的内容。

df['2008-01-01':'2008-01-01']

2.Use the method designed to give you this result. loc[]

2.使用旨在给你这个结果的方法。 loc[]

df.loc['2008-01-01']

Link to the documentation

链接到文档

回答by Scratch'N'Purr

You can use the to_pydatetimefunction on your index so thus:

您可以在索引上使用to_pydatetime函数,因此:

y[y.index.to_pydatetime() == datetime.datetime(2008,1,1)]

回答by bob_monsen

Reversing your dataframe allows the indexing to work:

反转数据框允许索引工作:

Here is your .csv datafile:

这是您的 .csv 数据文件:

Date,PETR4,CSNA3,VALE5
2008-01-01,0.0,0.0,0.0
2008-01-02,1.0,1.0,1.0
2008-01-03,7.0,7.0,7.0

Use the following incantation to read it into a DataFrame:

使用以下咒语将其读入 DataFrame:

>>> a = pd.read_csv('your.csv', index_col=0, parse_dates=True, infer_datetime_format=True)

Then, try to index a row:

然后,尝试索引一行:

>>> a['2008-01-01']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
  File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)
  File "pandas/hashtable.pyx", line 668, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
  File "pandas/hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: '2008-01-01'

You end up with a traceback with KeyError

您最终会使用 KeyError 进行回溯

However, if you reverse it, like this:

但是,如果您将其反转,如下所示:

>>> b = a[::-1]

Then try the same index, you get the proper result:

然后尝试相同的索引,你会得到正确的结果:

>>> b['2008-01-01']
            PETR4  CSNA3  VALE5
Date                           
2008-01-01      0      0      0

I do NOTknow why this is the case. Chances are, it has something to do with being a time series one way, but not the other? Someone more knowledgeable should answer that.

知道为什么会这样。有可能,它以一种方式与时间序列有关,但与另一种方式无关?应该有更懂行的人来回答。

Update: By RTFM, I discovered this page:

更新:通过 RTFM,我发现了这个页面:

https://pandas.pydata.org/pandas-docs/stable/timeseries.html

https://pandas.pydata.org/pandas-docs/stable/timeseries.html

If you find the section titled "Slice vs. Exact Match", there is a warning that explains this behavior. The problem seems to be that for a TimeSeries, an exact match is interpreted as a column name. For unsorted dataframes, this doesn't happen. See the warning box in the section referenced above. I still find this terribly confusing, but there you go...

如果您找到标题为“切片与精确匹配”的部分,则有一条警告解释了这种行为。问题似乎是对于 TimeSeries,精确匹配被解释为列名。对于未排序的数据帧,这不会发生。请参阅上述部分中的警告框。我仍然觉得这非常令人困惑,但是你去...

Edit: Changed the printout of b, which was wrong in the original.

编辑:修改了b的打印输出,原来是错误的。

Edit1: Update with explanation in python documentation.

编辑 1:在 python 文档中更新解释。