Python 使用 map 时的 Pandas 警告:试图在 DataFrame 的切片副本上设置值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33215630/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:01:17  来源:igfitidea点击:

Pandas warning when using map: A value is trying to be set on a copy of a slice from a DataFrame

pythonpandas

提问by Mike

I've got the following code and it works. This basically renames values in columns so that they can be later merged.

我有以下代码并且它有效。这基本上是重命名列中的值,以便以后可以合并它们。

pop = pd.read_csv('population.csv')
pop_recent = pop[pop['Year'] == 2014]

mapping = {
        'Korea, Rep.': 'South Korea',
        'Taiwan, China': 'Taiwan'
}
f= lambda x: mapping.get(x, x)
pop_recent['Country Name'] = pop_recent['Country Name'].map(f)

Warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copypop_recent['Country Name'] = pop_recent['Country Name'].map(f)

警告: 正在尝试在来自 DataFrame 的切片副本上设置值。尝试使用 .loc[row_indexer,col_indexer] = value 代替查看文档中的警告:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy pop_recent[' Country Name'] = pop_recent['Country Name'].map(f)

I did google this! But no examples seem to be using map, so I'm at a loss...

我确实谷歌了这个!但似乎没有例子使用地图,所以我不知所措......

采纳答案by Anand S Kumar

The issue is with chained indexing, what you are actually trying to do is to set values to - pop[pop['Year'] == 2014]['Country Name']- this would not work most of the times (as explained very well in the linked documentation) as this is two different calls and one of the calls may return a copy of the dataframe (I believe the boolean indexing) is returning the copy of the dataframe).

问题在于链式索引,您实际尝试做的是将值设置为 - pop[pop['Year'] == 2014]['Country Name']- 这在大多数情况下不起作用(如链接文档中所解释的那样),因为这是两个不同的调用和其中一个调用可能会返回数据帧的副本(我相信布尔索引)正在返回数据帧的副本)。

Hence, when you try to set values to that copy, it does not reflect in the original dataframe. Example -

因此,当您尝试为该副本设置值时,它不会反映在原始数据框中。例子 -

In [6]: df
Out[6]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [7]: df[df['A']==1]['B'] = 10
/path/to/ipython-script.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

In [8]: df
Out[8]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9


As noted , instead of chained indexing you should use DataFrame.locto index the rows as well as the columns to update in a single call, avoiding this error. Example -

如前所述,您应该使用DataFrame.loc索引行和列而不是链式索引以在一次调用中更新,从而避免此错误。例子 -

pop.loc[(pop['year'] == 2014), 'Country Name'] = pop.loc[(pop['year'] == 2014), 'Country Name'].map(f)

Or if this seem too long to you, you can create a mask (boolean dataframe) beforehand and assign to a variable, and use that in the above statement. Example -

或者,如果这对您来说太长,您可以事先创建一个掩码(布尔数据帧)并分配给一个变量,然后在上面的语句中使用它。例子 -

mask = pop['year'] == 2014
pop.loc[mask,'Country Name'] = pop.loc[mask,'Country Name'].map(f)

Demo -

演示 -

In [9]: df
Out[9]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [10]: mapping = { 1:2 , 3:4}

In [11]: f= lambda x: mapping.get(x, x)

In [12]: df.loc[(df['B']==2),'A'] = df.loc[(df['B']==2),'A'].map(f)

In [13]: df
Out[13]:
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9

Demo with the maskmethod -

使用掩码方法演示-

In [18]: df
Out[18]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [19]: mask = df['B']==2

In [20]: df.loc[mask,'A'] = df.loc[mask,'A'].map(f)

In [21]: df
Out[21]:
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9

回答by Gregg

I recommend you to reset indicesin pop_recent = pop[pop['Year'] == 2014].

我建议你重置指数pop_recent = pop[pop['Year'] == 2014]

If you want to apply some function to some column of dataframe, try to use function applyfunction of DataFrame API. Simple demo:

如果你想对数据帧的某列应用某个函数,请尝试使用applyDataFrame API 的函数函数。简单演示:

 mapping = {
        'Korea, Rep.': 'South Korea',
        'Taiwan, China': 'Taiwan'
 }
 df = pandas.DataFrame({'Country':['Korea, Rep.', 'Taiwan, China', 'Japan', 'USA'], 'date':[2014, 2014, 2015, 2014]})
 df_recent = df[df['date'] == 2014].reset_index()
 df_recent['Country'] = df_recent['Country'].apply(lambda x: mapping.get(x, x))

Output:

输出:

>>> df_recent
index      Country  date
0      0  South Korea  2014
1      1       Taiwan  2014
2      3          USA  2014