在 Pandas DataFrame 中设置新列以避免 SettingWithCopyWarning 的正确方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42379818/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Correct way to set new column in pandas DataFrame to avoid SettingWithCopyWarning
提问by djj
Trying to create a new column in the netc df but i get the warning
试图在 netc df 中创建一个新列,但我收到警告
netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
C:\Anaconda\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
whats the proper way to create a field in the newer version of Pandas to avoid getting the warning?
在较新版本的 Pandas 中创建字段以避免收到警告的正确方法是什么?
pd.__version__
Out[45]:
u'0.19.2+0.g825876c.dirty'
采纳答案by Filip Kilibarda
As it says in the error, try using .loc[row_indexer,col_indexer]
to create the new column.
正如错误中所说,尝试使用.loc[row_indexer,col_indexer]
来创建新列。
netc.loc[:,"DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM.
Notes
笔记
By the Pandas Indexing Docsyour code should work.
通过Pandas 索引文档,您的代码应该可以工作。
netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
gets translated to
被翻译成
netc.__setitem__('DeltaAMPP', netc.LOAD_AM - netc.VPP12_AM)
Which should have predictable behaviour. The SettingWithCopyWarning
is only there to warn users of unexpected behaviour during chained assignment (which is not what you're doing). However, as mentioned in the docs,
其中应该具有可预测的行为。在SettingWithCopyWarning
仅存在链式分配期间,警告的意外行为的用户(这是你做的不是)。但是,正如文档中所述,
Sometimes a
SettingWithCopy
warning will arise at times when there's no obvious chained indexing going on. These are the bugs thatSettingWithCopy
is designed to catch! Pandas is probably trying to warn you that you've done this:
有时
SettingWithCopy
,当没有明显的链式索引时,会出现警告。这些SettingWithCopy
是旨在捕获的错误!Pandas 可能试图警告你你已经这样做了:
The docs then go on to give an example of when one might get that error even when it's not expected. So I can't tell why that's happening without more context.
然后文档继续给出一个例子,说明即使在意料之外的情况下也可能会出现该错误。因此,如果没有更多背景,我无法说出为什么会发生这种情况。
回答by Ronan Paix?o
Your example is incomplete, as it doesn't show where netc
comes from. It is likely that netc itself is the product of slicing, and as such Pandas cannot make guarantees that it isn't a view or a copy.
您的示例不完整,因为它没有显示netc
来自哪里。netc 本身很可能是切片的产物,因此 Pandas 无法保证它不是视图或副本。
For example, if you're doing this:
例如,如果你这样做:
netc = netb[netb["DeltaAMPP"] == 0]
netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
then Pandas wouldn't know if netc
is a view or a copy. If it were a one-liner, it would effectively be like this:
那么 Pandas 将不知道netc
是视图还是副本。如果它是一个单线,它实际上是这样的:
netb[netb["DeltaAMPP"] == 0]["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
where you can see the double indexing more clearly.
在那里您可以更清楚地看到双索引。
If you want to make netc
separate from netb
, one possible remedy might be to force a copy in the first line (the loc
is to make sure we're not copying two times), like:
如果您想与netc
分开netb
,一个可能的补救方法可能是在第一行强制复制(这loc
是为了确保我们不会复制两次),例如:
netc = netb.loc[netb["DeltaAMPP"] == 0].copy()
If, on the other hand, you want to have netb
modified with the new column, you may do:
另一方面,如果您想netb
使用新列进行修改,则可以执行以下操作:
netb.loc[netb["DeltaAMPP"] == 0, "DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM