pandas 了解熊猫数据框索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14192741/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Understanding pandas dataframe indexing
提问by K.-Michael Aye
Summary: This doesn't work:
总结:这不起作用:
df[df.key==1]['D'] = 1
but this does:
但这确实:
df.D[df.key==1] = 1
Why?
为什么?
Reproduction:
再生产:
In [1]: import pandas as pd
In [2]: from numpy.random import randn
In [4]: df = pd.DataFrame(randn(6,3),columns=list('ABC'))
In [5]: df
Out[5]:
A B C
0 1.438161 -0.210454 -1.983704
1 -0.283780 -0.371773 0.017580
2 0.552564 -0.610548 0.257276
3 1.931332 0.649179 -1.349062
4 1.656010 -1.373263 1.333079
5 0.944862 -0.657849 1.526811
In [6]: df['D']=0.0
In [7]: df['key']=3*[1]+3*[2]
In [8]: df
Out[8]:
A B C D key
0 1.438161 -0.210454 -1.983704 0 1
1 -0.283780 -0.371773 0.017580 0 1
2 0.552564 -0.610548 0.257276 0 1
3 1.931332 0.649179 -1.349062 0 2
4 1.656010 -1.373263 1.333079 0 2
5 0.944862 -0.657849 1.526811 0 2
This doesn't work:
这不起作用:
In [9]: df[df.key==1]['D'] = 1
In [10]: df
Out[10]:
A B C D key
0 1.438161 -0.210454 -1.983704 0 1
1 -0.283780 -0.371773 0.017580 0 1
2 0.552564 -0.610548 0.257276 0 1
3 1.931332 0.649179 -1.349062 0 2
4 1.656010 -1.373263 1.333079 0 2
5 0.944862 -0.657849 1.526811 0 2
but this does:
但这确实:
In [11]: df.D[df.key==1] = 3.4
In [12]: df
Out[12]:
A B C D key
0 1.438161 -0.210454 -1.983704 3.4 1
1 -0.283780 -0.371773 0.017580 3.4 1
2 0.552564 -0.610548 0.257276 3.4 1
3 1.931332 0.649179 -1.349062 0.0 2
4 1.656010 -1.373263 1.333079 0.0 2
5 0.944862 -0.657849 1.526811 0.0 2
My question is:
我的问题是:
Why does only the 2nd way work? I can't seem to see a difference in selection/indexing logic.
为什么只有第二种方式有效?我似乎看不到选择/索引逻辑的差异。
Version is 0.10.0
版本是 0.10.0
Edit: This should not be done like this anymore. Since version 0.11, there is
.loc. See here: http://pandas.pydata.org/pandas-docs/stable/indexing.html
编辑:这不应该再这样做了。从 0.11 版本开始,有
.loc. 见这里:http: //pandas.pydata.org/pandas-docs/stable/indexing.html
回答by Thorsten Kranz
The pandas documentation says:
Pandas文档说:
Returning a view versus a copy
The rules about when a view on the data is returned are entirely dependent on NumPy. Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.
返回视图与副本
关于何时返回数据视图的规则完全取决于 NumPy。每当索引操作中涉及标签数组或布尔向量时,结果将是一个副本。使用单标签/标量索引和切片,例如 df.ix[3:6] 或 df.ix[:, 'A'],将返回一个视图。
In df[df.key==1]['D']you first do boolean slicing (leading to a copy of the Dataframe), then you choose a column ['D'].
在df[df.key==1]['D']你第一次做布尔切片(导致副本数据帧),那你就选择一列[“d”]。
In df.D[df.key==1] = 3.4, you first choose a column, then do boolean slicing on the resulting Series.
在 中df.D[df.key==1] = 3.4,您首先选择一列,然后对生成的Series进行布尔切片。
This seems to make the difference, although I must admit that it is a little counterintuitive.
这似乎有所作为,尽管我必须承认这有点违反直觉。
Edit: The difference was identified by Dougal, see his comment: With version 1, the copy is made as the __getitem__method is called for the boolean slicing. For version 2, only the __setitem__method is accessed - thus not returning a copy but just assigning.
编辑:Dougal 发现了差异,请参阅他的评论:在版本 1 中,复制是在__getitem__调用布尔切片的方法时进行的。对于版本 2,只__setitem__访问方法 - 因此不返回副本而只是分配。
回答by cxrodgers
I am pretty sure that your 1st way is returning a copy, instead of a view, and so assigning to it does not change the original data. I am not sure why this is happening though.
我很确定您的第一种方法是返回副本,而不是视图,因此分配给它不会更改原始数据。我不确定为什么会发生这种情况。
It seems to be related to the order in which you select rows and columns, NOT the syntax for getting columns. These both work:
这似乎与您选择行和列的顺序有关,而不是与获取列的语法有关。这些都有效:
df.D[df.key == 1] = 1
df['D'][df.key == 1] = 1
And neither of these works:
这些都不起作用:
df[df.key == 1]['D'] = 1
df[df.key == 1].D = 1
From this evidence, I would assume that the slice df[df.key == 1]is returning a copy. But this is not the case! df[df.key == 1] = 0will actually change the original data, as if it were a view.
根据这个证据,我会假设切片df[df.key == 1]正在返回一个副本。但这种情况并非如此!df[df.key == 1] = 0实际上会改变原始数据,就好像它是一个视图一样。
So, I'm not sure. My sense is that this behavior has changed with the version of pandas. I seem to remember that df.D used to return a copy and df['D'] used to return a view, but this doesn't appear to be true anymore (pandas 0.10.0).
所以,我不确定。我的感觉是这种行为随着Pandas的版本而改变。我似乎记得 df.D 曾经返回一个副本, df['D'] 曾经返回一个视图,但这似乎不再是真的(pandas 0.10.0)。
If you want a more complete answer, you should post in the pystatsmodels forum: https://groups.google.com/forum/?fromgroups#!forum/pystatsmodels
如果你想要一个更完整的答案,你应该在 pystatsmodels 论坛上发帖:https://groups.google.com/forum/ ?fromgroups#!forum/pystatsmodels

