使用 ix() 方法对带有负索引的 Pandas DataFrame 进行切片
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14035817/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
slicing pandas DataFrame with negative index with ix() method
提问by Julia He
DataFrame.ix() does not seem to slice the DataFrame that I want when negative indexing is used.
当使用负索引时,DataFrame.ix() 似乎没有切片我想要的 DataFrame。
I have a DataFrame object and want to slice the last 2 rows.
我有一个 DataFrame 对象,想要切片最后 2 行。
In [90]: df = pd.DataFrame(np.random.randn(10, 4))
In [91]: df
Out[91]:
0 1 2 3
0 1.985922 0.664665 -2.800102 1.695480
1 0.580509 0.782473 1.032970 1.559917
2 0.584387 1.798743 0.095950 0.071999
3 1.956221 0.075530 -0.391008 1.692585
4 -0.644979 -1.959265 0.749394 -0.437995
5 -1.204964 0.653912 -1.426602 2.409855
6 1.178886 2.177259 -0.165106 1.145952
7 1.410595 -0.761426 -1.280866 0.609122
8 0.110534 -0.234781 -0.819976 0.252080
9 1.798894 0.553394 -1.358335 1.278704
One way to do it:
一种方法:
In [92]: df[-2:]
Out[92]:
0 1 2 3
8 0.110534 -0.234781 -0.819976 0.252080
9 1.798894 0.553394 -1.358335 1.278704
Anther way to do it:
花药的做法:
In [93]: df.ix[len(df)-2:, :]
Out[93]:
0 1 2 3
8 0.110534 -0.234781 -0.819976 0.252080
9 1.798894 0.553394 -1.358335 1.278704
Now I want to use negative indexing, but having problem:
现在我想使用负索引,但有问题:
In [94]: df.ix[-2:, :]
Out[94]:
0 1 2 3
0 1.985922 0.664665 -2.800102 1.695480
1 0.580509 0.782473 1.032970 1.559917
2 0.584387 1.798743 0.095950 0.071999
3 1.956221 0.075530 -0.391008 1.692585
4 -0.644979 -1.959265 0.749394 -0.437995
5 -1.204964 0.653912 -1.426602 2.409855
6 1.178886 2.177259 -0.165106 1.145952
7 1.410595 -0.761426 -1.280866 0.609122
8 0.110534 -0.234781 -0.819976 0.252080
9 1.798894 0.553394 -1.358335 1.278704
How do I use negative indexing with DataFrame.ix() correctly? Thanks.
如何正确使用 DataFrame.ix() 负索引?谢谢。
回答by Wes McKinney
This is a bug:
这是一个错误:
In [1]: df = pd.DataFrame(np.random.randn(10, 4))
In [2]: df
Out[2]:
0 1 2 3
0 -3.100926 -0.580586 -1.216032 0.425951
1 -0.264271 -1.091915 -0.602675 0.099971
2 -0.846290 1.363663 -0.382874 0.065783
3 -0.099879 -0.679027 -0.708940 0.138728
4 -0.302597 0.753350 -0.112674 -1.253316
5 -0.213237 -0.467802 0.037350 0.369167
6 0.754915 -0.569134 -0.297824 -0.600527
7 0.644742 0.038862 0.216869 0.294149
8 0.101684 0.784329 0.218221 0.965897
9 -1.482837 -1.325625 1.008795 -0.150439
In [3]: df.ix[-2:]
Out[3]:
0 1 2 3
0 -3.100926 -0.580586 -1.216032 0.425951
1 -0.264271 -1.091915 -0.602675 0.099971
2 -0.846290 1.363663 -0.382874 0.065783
3 -0.099879 -0.679027 -0.708940 0.138728
4 -0.302597 0.753350 -0.112674 -1.253316
5 -0.213237 -0.467802 0.037350 0.369167
6 0.754915 -0.569134 -0.297824 -0.600527
7 0.644742 0.038862 0.216869 0.294149
8 0.101684 0.784329 0.218221 0.965897
9 -1.482837 -1.325625 1.008795 -0.150439
https://github.com/pydata/pandas/issues/2600
https://github.com/pydata/pandas/issues/2600
Note that df[-2:]will work:
请注意,这df[-2:]将起作用:
In [4]: df[-2:]
Out[4]:
0 1 2 3
8 0.101684 0.784329 0.218221 0.965897
9 -1.482837 -1.325625 1.008795 -0.150439
回答by Zelazny7
ix's main purpose is to allow numpy like indexing with support for row and column labels. So I'm not sure your use-case is the intended purpose. Here are a couple of ways I can think of, mostly trivial:
ix的主要目的是允许类似 numpy 的索引并支持行和列标签。所以我不确定你的用例是预期的目的。以下是我能想到的几种方法,大多是微不足道的:
In [142]: df.ix[:][-2:]
Out[142]:
0 1 2 3
8 0.386882 -0.836112 -0.108250 -0.433797
9 0.642468 -0.399255 -0.911456 -0.497720
In [161]: df.ix[df.index[-2:],:]
Out[161]:
0 1 2 3
8 0.386882 -0.836112 -0.108250 -0.433797
9 0.642468 -0.399255 -0.911456 -0.497720
I don't think ixsupports negative indexing at all. It seems to just ignore it altogether:
我认为根本不ix支持负索引。它似乎完全忽略了它:
In [181]: df.ix[-100:,:]
Out[181]:
0 1 2 3
0 -1.144137 -1.042034 -2.158838 0.674055
1 -0.424184 1.237318 -1.846130 0.575357
2 -0.844974 -0.541060 2.197364 -0.031898
3 0.846263 1.244450 -1.570566 -0.477919
4 -0.193445 0.171045 -0.235587 -1.185583
5 1.361539 -1.107389 -1.321081 -0.776407
6 0.505907 -1.364414 -2.093770 0.144016
7 -0.888465 -0.329153 0.491264 -0.363472
8 0.386882 -0.836112 -0.108250 -0.433797
9 0.642468 -0.399255 -0.911456 -0.497720
Edit: From the pandas documentationwe have:
编辑:从Pandas文档我们有:
Label-based indexing with integer axis labels is a thorny topic. It has been discussed heavily on mailing lists and among various members of the scientific Python community. In pandas, our general viewpoint is that labels matter more than integer locations. Therefore, with an integer axis index only label-based indexing is possible with the standard tools like .ix. The following code will generate exceptions:
s = Series(range(5)) s[-1] df = DataFrame(np.random.randn(5, 4)) df df.ix[-2:]This deliberate decision was made to prevent ambiguities and subtle bugs (many users reported finding bugs when the API change was made to stop “falling back” on position-based indexing).
带有整数轴标签的基于标签的索引是一个棘手的话题。它已经在邮件列表和科学 Python 社区的各个成员之间进行了大量讨论。在 Pandas 中,我们的一般观点是标签比整数位置更重要。因此,对于整数轴索引,只能使用 .ix 等标准工具进行基于标签的索引。以下代码将产生异常:
s = Series(range(5)) s[-1] df = DataFrame(np.random.randn(5, 4)) df df.ix[-2:]这一深思熟虑的决定是为了防止歧义和细微的错误(许多用户报告说,当 API 更改停止“回退”基于位置的索引时发现了错误)。

