Python 如何处理 Pandas 中的 SettingWithCopyWarning?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20625582/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:52:24  来源:igfitidea点击:

How to deal with SettingWithCopyWarning in Pandas?

pythonpandasdataframechained-assignment

提问by bigbug

Background

背景

I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this:

我刚刚将 Pandas 从 0.11 升级到 0.13.0rc1。现在,该应用程序弹出了许多新警告。其中之一是这样的:

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE

I want to know what exactly it means? Do I need to change something?

我想知道具体是什么意思?我需要改变什么吗?

How should I suspend the warning if I insist to use quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE?

如果我坚持使用,我应该如何暂停警告quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE

The function that gives errors

给出错误的函数

def _decode_stock_quote(list_of_150_stk_str):
    """decode the webpage and return dataframe"""

    from cStringIO import StringIO

    str_of_all = "".join(list_of_150_stk_str)

    quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]
    quote_df['TClose'] = quote_df['TPrice']
    quote_df['RT']     = 100 * (quote_df['TPrice']/quote_df['TPCLOSE'] - 1)
    quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
    quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
    quote_df['STK_ID'] = quote_df['STK'].str.slice(13,19)
    quote_df['STK_Name'] = quote_df['STK'].str.slice(21,30)#.decode('gb2312')
    quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

    return quote_df

More error messages

更多错误信息

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
E:\FinReporter\FM_EXT.py:450: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
E:\FinReporter\FM_EXT.py:453: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

采纳答案by Garrett

The SettingWithCopyWarningwas created to flag potentially confusing "chained" assignments, such as the following, which does not always work as expected, particularly when the first selection returns a copy. [see GH5390and GH5597for background discussion.]

SettingWithCopyWarning被创造的标志可能造成混淆的“链接”的任务,比如下面这并不总是按预期方式工作,特别是当第一选择返回一个副本。[有关背景讨论,请参阅GH5390GH5597。]

df[df['A'] > 2]['B'] = new_val  # new_val not set in df

The warning offers a suggestion to rewrite as follows:

该警告提供了重写如下的建议:

df.loc[df['A'] > 2, 'B'] = new_val

However, this doesn't fit your usage, which is equivalent to:

但是,这不适合您的用法,相当于:

df = df[df['A'] > 2]
df['B'] = new_val

While it's clear that you don't care about writes making it back to the original frame (since you are overwriting the reference to it), unfortunately this pattern cannot be differentiated from the first chained assignment example. Hence the (false positive) warning. The potential for false positives is addressed in the docs on indexing, if you'd like to read further. You can safely disable this new warning with the following assignment.

虽然很明显您不关心将其返回到原始帧的写入(因为您正在覆盖对它的引用),但不幸的是,这种模式无法与第一个链式赋值示例区分开来。因此(误报)警告。如果您想进一步阅读,在 indexing 文档中解决了误报的可能性。您可以通过以下分配安全地禁用此新警告。

import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

回答by Jeff

In general the point of the SettingWithCopyWarningis to show users (and especially new users) that they maybe operating on a copy and not the original as they think. There arefalse positives (IOW if you know what you are doing it could be ok). One possibility is simply to turn off the (by default warn) warning as @Garrett suggest.

一般来说,目的SettingWithCopyWarning是向用户(尤其是新用户)展示他们可能正在操作副本而不是他们认为的原始版本。这里误报(IOW如果你知道你在做什么,它可能是好的)。一种可能性是像@Garrett 建议的那样简单地关闭(默认为warn)警告。

Here is another option:

这是另一种选择:

In [1]: df = DataFrame(np.random.randn(5, 2), columns=list('AB'))

In [2]: dfa = df.ix[:, [1, 0]]

In [3]: dfa.is_copy
Out[3]: True

In [4]: dfa['A'] /= 2
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  #!/usr/local/bin/python

You can set the is_copyflag to False, which will effectively turn off the check, for that object:

您可以将is_copy标志设置为False,这将有效地关闭该对象的检查:

In [5]: dfa.is_copy = False

In [6]: dfa['A'] /= 2

If you explicitly copy then no further warning will happen:

如果您明确复制,则不会发生进一步的警告:

In [7]: dfa = df.ix[:, [1, 0]].copy()

In [8]: dfa['A'] /= 2

The code the OP is showing above, while legitimate, and probably something I do as well, is technically a case for this warning, and not a false positive. Another way to nothave the warning would be to do the selection operation via reindex, e.g.

OP 上面显示的代码虽然是合法的,而且可能我也做了一些事情,但从技术上讲,它是此警告的一个案例,而不是误报。没有警告的另一种方法是通过reindex例如进行选择操作

quote_df = quote_df.reindex(columns=['STK', ...])

Or,

或者,

quote_df = quote_df.reindex(['STK', ...], axis=1)  # v.0.21

回答by firelynx

Pandas dataframe copy warning

Pandas 数据框复制警告

When you go and do something like this:

当你去做这样的事情时:

quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

pandas.ixin this casereturns a new, stand alone dataframe.

pandas.ix在这种情况下,返回一个新的、独立的数据帧。

Any values you decide to change in this dataframe, will not change the original dataframe.

您决定在此数据框中更改的任何值都不会更改原始数据框。

This is what pandas tries to warn you about.

这就是 pandas 试图警告你的。



Why .ixis a bad idea

为什么.ix是个坏主意

The .ixobject tries to do more than one thing, and for anyone who has read anything about clean code, this is a strong smell.

.ix对象试图做不止一件事,对于任何阅读过有关干净代码的任何内容的人来说,这是一种强烈的气味。

Given this dataframe:

鉴于此数据框:

df = pd.DataFrame({"a": [1,2,3,4], "b": [1,1,2,2]})

Two behaviors:

两种行为:

dfcopy = df.ix[:,["a"]]
dfcopy.a.ix[0] = 2

Behavior one: dfcopyis now a stand alone dataframe. Changing it will not change df

行为一:dfcopy现在是一个独立的数据框。改变它不会改变df

df.ix[0, "a"] = 3

Behavior two: This changes the original dataframe.

行为二:这会更改原始数据帧。



Use .locinstead

使用.loc替代

The pandas developers recognized that the .ixobject was quite smelly[speculatively] and thus created two new objects which helps in the accession and assignment of data. (The other being .iloc)

Pandas 开发人员认识到该.ix对象[推测性地] 很臭,因此创建了两个新对象,这有助于数据的加入和分配。(另一个是.iloc

.locis faster, because it does not try to create a copy of the data.

.loc更快,因为它不会尝试创建数据的副本。

.locis meant to modify your existing dataframe inplace, which is more memory efficient.

.loc旨在就地修改您现有的数据帧,从而提高内存效率。

.locis predictable, it has one behavior.

.loc是可预测的,它有一种行为。



The solution

解决方案

What you are doing in your code example is loading a big file with lots of columns, then modifying it to be smaller.

您在代码示例中所做的是加载一个包含大量列的大文件,然后将其修改为更小。

The pd.read_csvfunction can help you out with a lot of this and also make the loading of the file a lot faster.

pd.read_csv功能可以帮助您解决很多问题,并使文件加载速度更快。

So instead of doing this

所以不要这样做

quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

Do this

做这个

columns = ['STK', 'TPrice', 'TPCLOSE', 'TOpen', 'THigh', 'TLow', 'TVol', 'TAmt', 'TDate', 'TTime']
df = pd.read_csv(StringIO(str_of_all), sep=',', usecols=[0,3,2,1,4,5,8,9,30,31])
df.columns = columns

This will only read the columns you are interested in, and name them properly. No need for using the evil .ixobject to do magical stuff.

这只会读取您感兴趣的列,并正确命名它们。不需要使用邪恶的.ix对象来做神奇的事情。

回答by Steohan

If you have assigned the slice to a variable and want to set using the variable as in the following:

如果您已将切片分配给一个变量并希望使用该变量进行设置,如下所示:

df2 = df[df['A'] > 2]
df2['B'] = value

And you do not want to use Jeffs solution because your condition computing df2is to long or for some other reason, then you can use the following:

并且您不想使用 Jeffs 解决方案,因为您的条件计算df2时间过长或出于其他原因,那么您可以使用以下方法:

df.loc[df2.index.tolist(), 'B'] = value

df2.index.tolist()returns the indices from all entries in df2, which will then be used to set column B in the original dataframe.

df2.index.tolist()返回 df2 中所有条目的索引,然后将用于设置原始数据帧中的 B 列。

回答by Raphvanns

To remove any doubt, my solution was to make a deep copy of the slice instead of a regular copy. This may not be applicable depending on your context (Memory constraints / size of the slice, potential for performance degradation - especially if the copy occurs in a loop like it did for me, etc...)

为了消除任何疑问,我的解决方案是制作切片的深层副本而不是常规副本。这可能不适用,具体取决于您的上下文(内存限制/切片大小、性能下降的可能性 - 特别是如果复制发生在循环中,就像对我一样,等等......)

To be clear, here is the warning I received:

需要明确的是,这是我收到的警告:

/opt/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:54:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Illustration

插图

I had doubts that the warning was thrown because of a column I was dropping on a copy of the slice. While not technically trying to set a value in the copy of the slice, that was still a modification of the copy of the slice. Below are the (simplified) steps I have taken to confirm the suspicion, I hope it will help those of us who are trying to understand the warning.

我怀疑是因为我在切片副本上放置了一个列而抛出警告。虽然技术上不是试图在切片副本中设置值,但这仍然是对切片副本的修改。以下是我为确认怀疑而采取的(简化)步骤,我希望它会帮助我们这些试图理解警告的人。

Example 1: dropping a column on the original affects the copy

示例 1:在原件上删除一列会影响副本

We knew that already but this is a healthy reminder. This is NOTwhat the warning is about.

我们已经知道了,但这是一个健康的提醒。这不是警告的内容。

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A   B
0   111 121
1   112 122
2   113 123


>> df2 = df1
>> df2

A   B
0   111 121
1   112 122
2   113 123

# Dropping a column on df1 affects df2
>> df1.drop('A', axis=1, inplace=True)
>> df2
    B
0   121
1   122
2   123

It is possible to avoid changes made on df1 to affect df2

可以避免对 df1 所做的更改影响 df2

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

A   B
0   111 121
1   112 122
2   113 123

>> import copy
>> df2 = copy.deepcopy(df1)
>> df2
A   B
0   111 121
1   112 122
2   113 123

# Dropping a column on df1 does not affect df2
>> df1.drop('A', axis=1, inplace=True)
>> df2
    A   B
0   111 121
1   112 122
2   113 123

Example 2: dropping a column on the copy may affect the original

示例 2:在副本上删除一列可能会影响原始

This actually illustrates the warning.

这实际上说明了警告。

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A   B
0   111 121
1   112 122
2   113 123

>> df2 = df1
>> df2

    A   B
0   111 121
1   112 122
2   113 123

# Dropping a column on df2 can affect df1
# No slice involved here, but I believe the principle remains the same?
# Let me know if not
>> df2.drop('A', axis=1, inplace=True)
>> df1

B
0   121
1   122
2   123

It is possible to avoid changes made on df2 to affect df1

可以避免对 df2 所做的更改影响 df1

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A   B
0   111 121
1   112 122
2   113 123

>> import copy
>> df2 = copy.deepcopy(df1)
>> df2

A   B
0   111 121
1   112 122
2   113 123

>> df2.drop('A', axis=1, inplace=True)
>> df1

A   B
0   111 121
1   112 122
2   113 123

Cheers!

干杯!

回答by hughdbrown

You could avoid the whole problem like this, I believe:

我相信你可以避免这样的整个问题:

return (
    pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    .rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    .ix[:,[0,3,2,1,4,5,8,9,30,31]]
    .assign(
        TClose=lambda df: df['TPrice'],
        RT=lambda df: 100 * (df['TPrice']/quote_df['TPCLOSE'] - 1),
        TVol=lambda df: df['TVol']/TVOL_SCALE,
        TAmt=lambda df: df['TAmt']/TAMT_SCALE,
        STK_ID=lambda df: df['STK'].str.slice(13,19),
        STK_Name=lambda df: df['STK'].str.slice(21,30)#.decode('gb2312'),
        TDate=lambda df: df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10]),
    )
)

Using Assign. From the documentation: Assign new columns to a DataFrame, returning a new object (a copy) with all the original columns in addition to the new ones.

使用赋值。来自文档:将新列分配给 DataFrame,返回一个新对象(副本),其中包含除新列之外的所有原始列。

See Tom Augspurger's article on method chaining in pandas: https://tomaugspurger.github.io/method-chaining

请参阅 Tom Augspurger 关于熊猫方法链的文章:https://tomaugspurger.github.io/method-chaining

回答by Petr Szturc

For me this issue occured in a following >simplified< example. And I was also able to solve it (hopefully with a correct solution):

对我而言,此问题发生在以下 > 简化 < 示例中。而且我也能够解决它(希望有一个正确的解决方案):

old code with warning:

带有警告的旧代码:

def update_old_dataframe(old_dataframe, new_dataframe):
    for new_index, new_row in new_dataframe.iterrorws():
        old_dataframe.loc[new_index] = update_row(old_dataframe.loc[new_index], new_row)

def update_row(old_row, new_row):
    for field in [list_of_columns]:
        # line with warning because of chain indexing old_dataframe[new_index][field]
        old_row[field] = new_row[field]  
    return old_row

This printed the warning for the line old_row[field] = new_row[field]

这打印了该行的警告 old_row[field] = new_row[field]

Since the rows in update_row method are actually type Series, I replaced the line with:

由于 update_row 方法中的行实际上是 type Series,我用以下内容替换了该行:

old_row.at[field] = new_row.at[field]

i.e. methodfor accessing/lookups for a Series. Eventhough both works just fine and the result is same, this way I don't have to disable the warnings (=keep them for other chain indexing issues somewhere else).

方法用于访问一个/查找Series。尽管两者都可以正常工作并且结果相同,但这样我就不必禁用警告(=将它们保留为其他地方的其他链索引问题)。

I hope this may help someone.

我希望这可以帮助某人。

回答by jrouquie

This should work:

这应该有效:

quote_df.loc[:,'TVol'] = quote_df['TVol']/TVOL_SCALE

回答by cs95

How to deal with SettingWithCopyWarningin Pandas?

SettingWithCopyWarning在 Pandas 中如何处理?

This post is meant for readers who,

这篇文章是为读者准备的,

  1. Would like to understand what this warning means
  2. Would like to understand different ways of suppressing this warning
  3. Would like to understand how to improve their code and follow good practices to avoid this warning in the future.
  1. 想了解此警告的含义
  2. 想了解抑制此警告的不同方法
  3. 想了解如何改进他们的代码并遵循良好做法以避免将来出现此警告。

Setup

设置

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('ABCDE'))
df
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1


What is the SettingWithCopyWarning?

是什么SettingWithCopyWarning

To know how to deal with this warning, it is important to understand what it means and why it is raised in the first place.

要知道如何处理此警告,重要的是首先要了解它的含义以及为什么会提出它。

When filtering DataFrames, it is possible slice/index a frame to return either a view, or a copy, depending on the internal layout and various implementation details. A "view" is, as the term suggests, a view into the original data, so modifying the view may modify the original object. On the other hand, a "copy" is a replication of data from the original, and modifying the copy has no effect on the original.

过滤 DataFrame 时,根据内部布局和各种实现细节,可以对帧进行切片/索引以返回viewcopy。正如术语所暗示的那样,“视图”是原始数据的视图,因此修改视图可能会修改原始对象。另一方面,“副本”是对原始数据的复制,修改副本对原始数据没有影响。

As mentioned by other answers, the SettingWithCopyWarningwas created to flag "chained assignment" operations. Consider dfin the setup above. Suppose you would like to select all values in column "B" where values in column "A" is > 5. Pandas allows you to do this in different ways, some more correct than others. For example,

正如其他答案所提到的,SettingWithCopyWarning创建 是为了标记“链式分配”操作。df在上面的设置中考虑。假设您想选择“B”列中的所有值,其中“A”列中的值 > 5。Pandas 允许您以不同的方式执行此操作,有些比其他方式更正确。例如,

df[df.A > 5]['B']

1    3
2    6
Name: B, dtype: int64

And,

和,

df.loc[df.A > 5, 'B']

1    3
2    6
Name: B, dtype: int64

These return the same result, so if you are only reading these values, it makes no difference. So, what is the issue? The problem with chained assignment, is that it is generally difficult to predict whether a view or a copy is returned, so this largely becomes an issue when you are attempting to assign values back.To build on the earlier example, consider how this code is executed by the interpreter:

它们返回相同的结果,因此如果您只读取这些值,则没有区别。那么,问题是什么?链式赋值的问题在于,通常很难预测返回的是视图还是副本,因此当您尝试重新赋值时这在很大程度上成为一个问题。以前面的示例为基础,请考虑解释器如何执行此代码:

df.loc[df.A > 5, 'B'] = 4
# becomes
df.__setitem__((df.A > 5, 'B'), 4)

With a single __setitem__call to df. OTOH, consider this code:

只需__setitem__调用df. OTOH,请考虑以下代码:

df[df.A > 5]['B'] = 4
# becomes
df.__getitem__(df.A > 5).__setitem__('B", 4)

Now, depending on whether __getitem__returned a view or a copy, the __setitem__operation may not work.

现在,根据__getitem__返回的是视图还是副本,该__setitem__操作可能不起作用

In general, you should use locfor label-based assignment, and ilocfor integer/positional based assignment, as the spec guarantees that they always operate on the original. Additionally, for setting a single cell, you should use atand iat.

通常,您应该loc用于基于标签的分配,以及iloc基于整数/位置的分配,因为规范保证它们始终对原始值进行操作。此外,要设置单个单元格,您应该使用atiat

More can be found in the documentation.

可以在文档中找到更多信息

Note
All boolean indexing operations done with loccan also be done with iloc. The only difference is that ilocexpects either integers/positions for index or a numpy array of boolean values, and integer/position indexes for the columns.

For example,

df.loc[df.A > 5, 'B'] = 4

Can be written nas

df.iloc[(df.A > 5).values, 1] = 4

And,

df.loc[1, 'A'] = 100

Can be written as

df.iloc[1, 0] = 100

And so on.

注意
所有用 完成的布尔索引操作loc也可以用 完成iloc。唯一的区别是iloc索引需要整数/位置或布尔值的 numpy 数组,以及列的整数/位置索引。

例如,

df.loc[df.A > 5, 'B'] = 4

可以写nas

df.iloc[(df.A > 5).values, 1] = 4

和,

df.loc[1, 'A'] = 100

可以写成

df.iloc[1, 0] = 100

等等。



Just tell me how to suppress the warning!

只要告诉我如何抑制警告!

Consider a simple operation on the "A" column of df. Selecting "A" and dividing by 2 will raise the warning, but the operation will work.

考虑对 的“A”列的简单操作df。选择“A”并除以 2 将引发警告,但该操作将起作用。

df2 = df[['A']]
df2['A'] /= 2
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

df2
     A
0  2.5
1  4.5
2  3.5

There are a couple ways of directly silencing this warning:

有几种方法可以直接消除此警告:

  1. Make a deepcopy

    df2 = df[['A']].copy(deep=True)
    df2['A'] /= 2
    
  2. Change pd.options.mode.chained_assignment
    Can be set to None, "warn", or "raise". "warn"is the default. Nonewill suppress the warning entirely, and "raise"will throw a SettingWithCopyError, preventing the operation from going through.

    pd.options.mode.chained_assignment = None
    df2['A'] /= 2
    
  1. 做一个 deepcopy

    df2 = df[['A']].copy(deep=True)
    df2['A'] /= 2
    
  2. 更改pd.options.mode.chained_assignment
    可以设置为None"warn""raise""warn"是默认值。None将完全抑制警告,并"raise"抛出SettingWithCopyError,阻止操作通过。

    pd.options.mode.chained_assignment = None
    df2['A'] /= 2
    

@Peter Cottonin the comments, came up with a nice way of non-intrusively changing the mode (modified from this gist) using a context manager, to set the mode only as long as it is required, and the reset it back to the original state when finished.

@Peter Cotton在评论中提出了一种使用上下文管理器以非侵入性方式更改模式(从此要点修改)的好方法,仅在需要时才设置模式,然后将其重置回完成后的原始状态。

class ChainedAssignent:
    def __init__(self, chained=None):
        acceptable = [None, 'warn', 'raise']
        assert chained in acceptable, "chained must be in " + str(acceptable)
        self.swcw = chained

    def __enter__(self):
        self.saved_swcw = pd.options.mode.chained_assignment
        pd.options.mode.chained_assignment = self.swcw
        return self

    def __exit__(self, *args):
        pd.options.mode.chained_assignment = self.saved_swcw
class ChainedAssignent:
    def __init__(self, chained=None):
        acceptable = [None, 'warn', 'raise']
        assert chained in acceptable, "chained must be in " + str(acceptable)
        self.swcw = chained

    def __enter__(self):
        self.saved_swcw = pd.options.mode.chained_assignment
        pd.options.mode.chained_assignment = self.swcw
        return self

    def __exit__(self, *args):
        pd.options.mode.chained_assignment = self.saved_swcw

The usage is as follows:

用法如下:

# some code here
with ChainedAssignent():
    df2['A'] /= 2
# more code follows

Or, to raise the exception

或者,引发异常

with ChainedAssignent(chained='raise'):
    df2['A'] /= 2

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead


The "XY Problem": What am I doing wrong?

“XY 问题”:我做错了什么?

A lot of the time, users attempt to look for ways of suppressing this exception without fully understanding why it was raised in the first place. This is a good example of an XY problem, where users attempt to solve a problem "Y" that is actually a symptom of a deeper rooted problem "X". Questions will be raised based on common problems that encounter this warning, and solutions will then be presented.

很多时候,用户试图寻找抑制这个异常的方法,而没有完全理解它为什么首先被提出。这是XY 问题的一个很好的例子,用户试图解决一个问题“Y”,这个问题实际上是一个更深层次的问题“X”的症状。将根据遇到此警告的常见问题提出问题,然后提出解决方案。

Question 1
I have a DataFrame

df
       A  B  C  D  E
    0  5  0  3  3  7
    1  9  3  5  2  4
    2  7  6  8  8  1

I want to assign values in col "A" > 5 to 1000. My expected output is

      A  B  C  D  E
0     5  0  3  3  7
1  1000  3  5  2  4
2  1000  6  8  8  1

问题 1
我有一个 DataFrame

df
       A  B  C  D  E
    0  5  0  3  3  7
    1  9  3  5  2  4
    2  7  6  8  8  1

我想在 col "A" > 5 到 1000 中分配值。我的预期输出是

      A  B  C  D  E
0     5  0  3  3  7
1  1000  3  5  2  4
2  1000  6  8  8  1

Wrong way to do this:

错误的做法:

df.A[df.A > 5] = 1000         # works, because df.A returns a view
df[df.A > 5]['A'] = 1000      # does not work
df.loc[df.A  5]['A'] = 1000   # does not work

Right way using loc:

正确使用方法loc

df.loc[df.A > 5, 'A'] = 1000



Question 21
I am trying to set the value in cell (1, 'D') to 12345. My expected output is

   A  B  C      D  E
0  5  0  3      3  7
1  9  3  5  12345  4
2  7  6  8      8  1

I have tried different ways of accessing this cell, such as df['D'][1]. What is the best way to do this?

1. This question isn't specifically related to the warning, but it is good to understand how to do this particular operation correctly so as to avoid situations where the warning could potentially arise in future.

问题 2 1
我试图将单元格 (1, 'D') 中的值设置为 12345。我的预期输出是

   A  B  C      D  E
0  5  0  3      3  7
1  9  3  5  12345  4
2  7  6  8      8  1

我尝试了访问此单元格的不同方式,例如 df['D'][1]. 做这个的最好方式是什么?

1. 这个问题与警告没有特别关系,但最好了解如何正确执行此特定操作,以避免将来可能出现警告的情况。

You can use any of the following methods to do this.

您可以使用以下任何一种方法来执行此操作。

df.loc[1, 'D'] = 12345
df.iloc[1, 3] = 12345
df.at[1, 'D'] = 12345
df.iat[1, 3] = 12345



Question 3
I am trying to subset values based on some condition. I have a DataFrame

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

I would like to assign values in "D" to 123 such that "C" == 5. I tried

df2.loc[df2.C == 5, 'D'] = 123

Which seems fine but I am stillgetting the SettingWithCopyWarning! How do I fix this?

问题 3
我试图根据某些条件对值进行子集化。我有一个数据框

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

我想将“D”中的值赋给 123,使得“C”==5。我试过了

df2.loc[df2.C == 5, 'D'] = 123

这看起来不错,但我仍然得到了 SettingWithCopyWarning!我该如何解决?

This is actually probably because of code higher up in your pipeline. Did you create df2from something larger, like

这实际上可能是因为您的管道中的代码更高。你是df2从更大的东西中创造出来的,比如

df2 = df[df.A > 5]

? In this case, boolean indexing will return a view, so df2will reference the original. What you'd need to do is assign df2to a copy:

? 在这种情况下,布尔索引将返回一个视图,因此df2将引用原始视图。您需要做的是分配df2给一个副本

df2 = df[df.A > 5].copy()
# Or,
# df2 = df.loc[df.A > 5, :]



Question 4
I'm trying to drop column "C" in-place from

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

But using

df2.drop('C', axis=1, inplace=True)

Throws SettingWithCopyWarning. Why is this happening?

问题 4
我试图从原位删除列“C”

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

但是使用

df2.drop('C', axis=1, inplace=True)

抛出SettingWithCopyWarning。为什么会这样?

This is because df2must have been created as a view from some other slicing operation, such as

这是因为df2必须是从其他一些切片操作中创建的视图,例如

df2 = df[df.A > 5]

The solution here is to either make a copy()of df, or use loc, as before.

这里的解决方案是要么做copy()df,或使用loc,如前。

回答by musbur

Followup beginner question / remark

后续初学者问题/评论

Maybe a clarification for other beginners like me (I come from R which seems to work a bit differently under the hood). The following harmless-looking and functional code kept producing the SettingWithCopy warning, and I couldn't figure out why. I had both read and understood the issued with "chained indexing", but my code doesn't contain any:

也许是对像我这样的其他初学者的澄清(我来自 R,它的工作原理似乎有点不同)。以下看起来无害且功能强大的代码不断产生 SettingWithCopy 警告,我不知道为什么。我已经阅读并理解了“链式索引”的问题,但我的代码不包含任何内容:

def plot(pdb, df, title, **kw):
    df['target'] = (df['ogg'] + df['ugg']) / 2
    # ...

But then, later, much too late, I looked at where the plot() function is called:

但是后来,太晚了,我查看了调用 plot() 函数的位置:

    df = data[data['anz_emw'] > 0]
    pixbuf = plot(pdb, df, title)

So "df" isn't a data frame but an object that somehow remembers that it was created by indexing a data frame (so is that a view?) which would make the line in plot()

所以“df”不是一个数据框,而是一个以某种方式记住它是通过索引数据框创建的对象(那是一个视图吗?)这将使 plot()

 df['target'] = ...

equivalent to

相当于

 data[data['anz_emw'] > 0]['target'] = ...

which is a chained indexing. Did I get that right?

这是一个链式索引。我做对了吗?

Anyway,

反正,

def plot(pdb, df, title, **kw):
    df.loc[:,'target'] = (df['ogg'] + df['ugg']) / 2

fixed it.

修复。