为什么在 Pandas DataFrame 中的矢量查找不起作用,但它确实适用于日期的系列/查找

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23554045/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:01:45  来源:igfitidea点击:

Why does vector lookup in pandas DataFrame not work but it does work with a Series/lookup on date

pythonpython-2.7pandas

提问by user3047520

For:

为了:

import numpy as np

import pandas as pd

x = pd.DataFrame(np.random.randn(6),index=pd.date_range('2015-01-15','2015-01-20')

In [37]: x[datetime(2015,1,15)]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-37-0ce45ca5a858> in <module>()
----> 1 x[datetime(2015,1,15)]

/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1656             return self._getitem_multilevel(key)
   1657         else:
-> 1658             return self._getitem_column(key)
   1659 
   1660     def _getitem_column(self, key):

/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   1663         # get column
   1664         if self.columns.is_unique:
-> 1665             return self._get_item_cache(key)
   1666 
   1667         # duplicate columns & possible reduce dimensionaility

/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1003         res = cache.get(item)
   1004         if res is None:
-> 1005             values = self._data.get(item)
   1006             res = self._box_item_values(item, values)
   1007             cache[item] = res

/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item)
   2871                 return self.get_for_nan_indexer(indexer)
   2872 
-> 2873             _, block = self._find_block(item)
   2874             return block.get(item)
   2875         else:

/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.pyc in _find_block(self, item)
   3183 
   3184     def _find_block(self, item):
-> 3185         self._check_have(item)
   3186         for i, block in enumerate(self.blocks):
   3187             if item in block:

/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.pyc in _check_have(self, item)
   3190     def _check_have(self, item):
   3191         if item not in self.items:
-> 3192             raise KeyError('no item named %s' % com.pprint_thing(item))
   3193 
   3194     def reindex_axis(self, new_axis, indexer=None, method=None, axis=0,

KeyError: u'no item named 2015-01-15 00:00:00'

BUT,

但,

In [39]: x = pd.Series(np.random.randn(6),index=pd.date_range('2015-01-15','2015-01-20'))

Does lookup correctly:

是否正确查找:

In [40]: x[datetime(2015,1,15)]

Out[40]: -2.0727569075280319

Could someone please explain why Series works on lookup but lookup on DataFrame does not?

有人能解释一下为什么 Series 可以用于查找,而在 DataFrame 上查找却不能吗?

Here is x:

这是 x:

In [41]: x
Out[41]: 
2015-01-15   -2.072757
2015-01-16   -0.682232
2015-01-17    1.681293
2015-01-18    2.151027
2015-01-19    0.493222
2015-01-20    0.538554
Freq: D, dtype: float64

回答by Jeff

Short answer is that you are selecting from different axes. See the indexing docs here

简短的回答是您正在从不同的轴中进行选择。在此处查看索引文档

In [1]: df = pd.DataFrame(np.random.randn(6),index=pd.date_range('2015-01-15','2015-01-20'))

In [2]: s = pd.Series(np.random.randn(6),index=pd.date_range('2015-01-15','2015-01-20'))

In [3]: key = datetime.datetime(2015,1,15)

This selects from the index axis

这从索引轴中选择

In [4]: df.loc[key]
Out[4]: 
0    0.562973
Name: 2015-01-15 00:00:00, dtype: float64

So does this

这也是

In [5]: s.loc[key]
Out[5]: 1.1151835852265839

As does this (because it only has 1 axis!)

这样做(因为它只有 1 个轴!)

In [6]: s[key]
Out[6]: 1.1151835852265839

Here are the columns of the DataFrame

这是DataFrame的列

In [8]: df.columns
Out[8]: Int64Index([0], dtype='int64')

getitemon a DataFrame select by default on the columns!

getitem在 DataFrame 上默认选择列!

In [9]: df[0]
Out[9]: 
2015-01-15    0.562973
2015-01-16   -1.112382
2015-01-17    0.279265
2015-01-18   -0.919848
2015-01-19   -1.156900
2015-01-20   -0.887971
Freq: D, Name: 0, dtype: float64

Not to confuse, but when you are selecting a partial slice, the DataFrame doesallow this convienence (this could also be datetime(2015,1,15):- it HAS to be a slice though. The idea is that this is a common operation on time-like series so it works (IMHO this is a bit confusing, but has been long established since pandas started).

不要混淆,但是当您选择 a 时partial slice,DataFrame确实允许这种便利 (这也可能是datetime(2015,1,15):- 不过它必须是一个切片。这个想法是,这是对类时间系列的常见操作,因此它可以工作(恕我直言这有点令人困惑,但自大Pandas开始以来已经建立了很长时间)。

See partial string indexing

请参阅部分字符串索引

In [13]: df['20150115':]
Out[13]: 
                   0
2015-01-15  0.562973
2015-01-16 -1.112382
2015-01-17  0.279265
2015-01-18 -0.919848
2015-01-19 -1.156900
2015-01-20 -0.887971

[6 rows x 1 columns]

Works the same in Series

在系列中工作相同

In [15]: s['20150115':]
Out[15]: 
2015-01-15    1.115184
2015-01-16    0.604819
2015-01-17   -0.112881
2015-01-18   -1.234023
2015-01-19    1.264301
2015-01-20   -0.873921
Freq: D, dtype: float64