在 Pandas DataFrame 中设置新列以避免 SettingWithCopyWarning 的正确方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42379818/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:02:07  来源:igfitidea点击:

Correct way to set new column in pandas DataFrame to avoid SettingWithCopyWarning

pythonpandas

提问by djj

Trying to create a new column in the netc df but i get the warning

试图在 netc df 中创建一个新列,但我收到警告

netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

C:\Anaconda\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

whats the proper way to create a field in the newer version of Pandas to avoid getting the warning?

在较新版本的 Pandas 中创建字段以避免收到警告的正确方法是什么?

pd.__version__
Out[45]:
u'0.19.2+0.g825876c.dirty'

采纳答案by Filip Kilibarda

As it says in the error, try using .loc[row_indexer,col_indexer]to create the new column.

正如错误中所说,尝试使用.loc[row_indexer,col_indexer]来创建新列。

netc.loc[:,"DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM.

Notes

笔记

By the Pandas Indexing Docsyour code should work.

通过Pandas 索引文档,您的代码应该可以工作。

netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

gets translated to

被翻译成

netc.__setitem__('DeltaAMPP', netc.LOAD_AM - netc.VPP12_AM)

Which should have predictable behaviour. The SettingWithCopyWarningis only there to warn users of unexpected behaviour during chained assignment (which is not what you're doing). However, as mentioned in the docs,

其中应该具有可预测的行为。在SettingWithCopyWarning仅存在链式分配期间,警告的意外行为的用户(这是你做的不是)。但是,正如文档中所述,

Sometimes a SettingWithCopywarning will arise at times when there's no obvious chained indexing going on. These are the bugs that SettingWithCopyis designed to catch! Pandas is probably trying to warn you that you've done this:

有时SettingWithCopy,当没有明显的链式索引时,会出现警告。这些SettingWithCopy是旨在捕获的错误!Pandas 可能试图警告你你已经这样做了:

The docs then go on to give an example of when one might get that error even when it's not expected. So I can't tell why that's happening without more context.

然后文档继续给出一个例子,说明即使在意料之外的情况下也可能会出现该错误。因此,如果没有更多背景,我无法说出为什么会发生这种情况。

回答by Ronan Paix?o

Your example is incomplete, as it doesn't show where netccomes from. It is likely that netc itself is the product of slicing, and as such Pandas cannot make guarantees that it isn't a view or a copy.

您的示例不完整,因为它没有显示netc来自哪里。netc 本身很可能是切片的产物,因此 Pandas 无法保证它不是视图或副本。

For example, if you're doing this:

例如,如果你这样做:

netc = netb[netb["DeltaAMPP"] == 0]
netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

then Pandas wouldn't know if netcis a view or a copy. If it were a one-liner, it would effectively be like this:

那么 Pandas 将不知道netc是视图还是副本。如果它是一个单线,它实际上是这样的:

netb[netb["DeltaAMPP"] == 0]["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

where you can see the double indexing more clearly.

在那里您可以更清楚地看到双索引。

If you want to make netcseparate from netb, one possible remedy might be to force a copy in the first line (the locis to make sure we're not copying two times), like:

如果您想与netc分开netb,一个可能的补救方法可能是在第一行强制复制(这loc是为了确保我们不会复制两次),例如:

netc = netb.loc[netb["DeltaAMPP"] == 0].copy()

If, on the other hand, you want to have netbmodified with the new column, you may do:

另一方面,如果您想netb使用新列进行修改,则可以执行以下操作:

netb.loc[netb["DeltaAMPP"] == 0, "DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM