Python Pandas 使用什么规则来生成视图和副本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23296282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What rules does Pandas use to generate a view vs a copy?
提问by orome
I'm confused about the rules Pandas uses when deciding that a selection from a dataframe is a copy of the original dataframe, or a view on the original.
我对 Pandas 在决定从数据帧中选择是原始数据帧的副本还是原始数据帧的视图时使用的规则感到困惑。
If I have, for example,
如果我有,例如,
df = pd.DataFrame(np.random.randn(8,8), columns=list('ABCDEFGH'), index=range(1,9))
I understand that a query
returns a copy so that something like
我知道 aquery
返回一个副本,以便像
foo = df.query('2 < index <= 5')
foo.loc[:,'E'] = 40
will have no effect on the original dataframe, df
. I also understand that scalar or named slices return a view, so that assignments to these, such as
对原始数据框没有影响,df
. 我也明白标量或命名切片返回一个视图,以便对这些进行赋值,例如
df.iloc[3] = 70
or
或者
df.ix[1,'B':'E'] = 222
will change df
. But I'm lost when it comes to more complicated cases. For example,
会改变df
。但是当涉及到更复杂的情况时,我就迷失了。例如,
df[df.C <= df.B] = 7654321
changes df
, but
变化df
,但是
df[df.C <= df.B].ix[:,'B':'E']
does not.
才不是。
Is there a simple rule that Pandas is using that I'm just missing? What's going on in these specific cases; and in particular, how do I change all values (or a subset of values) in a dataframe that satisfy a particular query (as I'm attempting to do in the last example above)?
Pandas 使用的是否有一个简单的规则我只是想念?在这些特定情况下发生了什么;特别是,如何更改满足特定查询的数据帧中的所有值(或值的子集)(正如我在上面的最后一个示例中尝试做的那样)?
Note: This is not the same as this question; and I have read the documentation, but am not enlightened by it. I've also read through the "Related" questions on this topic, but I'm still missing the simple rule Pandas is using, and how I'd apply it to — for example —?modify the values (or a subset of values) in a dataframe that satisfy a particular query.
注意:这与这个问题不同;我已经阅读了文档,但并没有受到启发。我也通读了关于这个主题的“相关”问题,但我仍然缺少 Pandas 正在使用的简单规则,以及我如何将它应用于 - 例如 -?修改值(或值的子集) ) 在满足特定查询的数据框中。
采纳答案by Jeff
Here's the rules, subsequent override:
这是规则,后续覆盖:
All operations generate a copy
If
inplace=True
is provided, it will modify in-place; only some operations support thisAn indexer that sets, e.g.
.loc/.iloc/.iat/.at
will set inplace.An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for
.query
; this will alwaysreturn a copy as its evaluated bynumexpr
)An indexer that gets on a multiple-dtyped object is always a copy.
所有操作生成一个副本
如果
inplace=True
提供,它将就地修改;只有一些操作支持这个设置的索引器,例如
.loc/.iloc/.iat/.at
将设置到位。获取单数据类型对象的索引器几乎总是一个视图(取决于内存布局,这可能不是这不可靠的原因)。这主要是为了效率。(上面的示例是 for
.query
;这将始终返回一个副本作为其评估的numexpr
)获取多类型对象的索引器始终是副本。
Your example of chained indexing
你的例子 chained indexing
df[df.C <= df.B].loc[:,'B':'E']
is not guaranteed to work (and thus you shoulld neverdo this).
不能保证工作(因此你永远不应该这样做)。
Instead do:
而是这样做:
df.loc[df.C <= df.B, 'B':'E']
as this is fasterand will always work
因为这更快并且将始终有效
The chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning
, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.
链式索引是 2 个独立的 python 操作,因此不能被 Pandas 可靠地拦截(你经常会得到SettingWithCopyWarning
,但这也不是 100% 可检测的)。您指出的dev docs提供了更完整的解释。