pandas 混淆:数据帧警告切片的熊猫副本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38835483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:45:22  来源:igfitidea点击:

Confusion re: pandas copy of slice of dataframe warning

pythonpandaschained-assignment

提问by Sam Lilienfeld

I've looked through a bunch of questions and answers related to this issue, but I'm still finding that I'm getting this copy of slice warning in places where I don't expect it. Also, it's cropping up in code that was running fine for me previously, leading me to wonder if some sort of update may be the culprit.

我已经浏览了一堆与此问题相关的问题和答案,但我仍然发现我在我不期望的地方收到了切片警告的副本。此外,它出现在之前对我来说运行良好的代码中,让我怀疑是否某种更新可能是罪魁祸首。

For example, this is a set of code where all I'm doing is reading in an Excel file into a pandas DataFrame, and cutting down the set of columns included with the df[[]]syntax.

例如,这是一组代码,其中我所做的只是将 Excel 文件读入 pandas DataFrame,并减少df[[]]语法中包含的列集。

 izmir = pd.read_excel(filepath)
 izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60','MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5','MC_OLD_18>F>5',
               'MC_OLD_M_Child<5','MC_OLD_F_Child<5','MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery','Date to insert','Date of Entery']]

Now, any further changes I make to this izmir_limfile raise the copy of slice warning.

现在,我对此izmir_lim文件所做的任何进一步更改都会引发切片警告的副本。

izmir_lim['Age'] = izmir_lim.Age.fillna(0)
izmir_lim['Age'] = izmir_lim.Age.astype(int)

/Users/samlilienfeld/anaconda/lib/python3.5/site-packages/ipykernel/main.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

/Users/samliienfeld/anaconda/lib/python3.5/site-packages/ipykernel/ main.py:2: SettingWithCopyWarning: 试图在 DataFrame 的切片副本上设置一个值。尝试使用 .loc[row_indexer,col_indexer] = value 代替

I'm confused because I thought the df[[]]column subsetting returned a copy by default. The only way I've found to suppress the errors is by explicitly adding df[[]].copy(). I could have sworn that in the past I did not have to do that and did not raise the copy of slice error.

我很困惑,因为我认为df[[]]列子集默认返回一个副本。我发现抑制错误的唯一方法是显式添加df[[]].copy(). 我可以发誓,过去我不必这样做,也没有引发切片错误的副本。

Similarly, I have some other code that runs a function on a dataframe to filter it in certain ways:

同样,我还有一些其他代码在数据帧上运行一个函数,以某些方式对其进行过滤:

def lim(df):
if (geography == "All"):
    df_geo = df
else:
    df_geo = df[df.center_JO == geography]

df_date = df_geo[(df_geo.date_survey >= start_date) & (df_geo.date_survey <= end_date)]

return df_date

df_lim = lim(df)

From this point forward, any changes I make to any of the values of df_limraise the copy of slice error. The only way around it that i've found is to change the function call to:

从现在开始,我对任何值所做的任何更改都会df_lim引发切片错误的副本。我发现的唯一解决方法是将函数调用更改为:

df_lim = lim(df).copy()

This just seems wrong to me. What am I missing? It seems like these use cases should return copies by default, and I could have sworn that the last time I ran these scripts I was not running in to these errors.
Do I just need to start adding .copy()all over the place? Seems like there should be a cleaner way to do this. Any insight or help is much appreciated.

这对我来说似乎是错误的。我错过了什么?似乎这些用例在默认情况下应该返回副本,我可以发誓,上次运行这些脚本时,我没有遇到这些错误。
我只需要开始.copy()到处添加吗?似乎应该有一种更清洁的方法来做到这一点。非常感谢任何见解或帮助。

回答by piRSquared

 izmir = pd.read_excel(filepath)
 izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60',
                    'MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5',
                    'MC_OLD_18>F>5','MC_OLD_M_Child<5','MC_OLD_F_Child<5',
                    'MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery',
                    'Date to insert','Date of Entery']]

izmir_limis a view/copy of izmir. You subsequently attempt to assign to it. This is what is throwing the error. Use this instead:

izmir_lim是 的视图/副本izmir。您随后尝试分配给它。这就是抛出错误的原因。改用这个:

 izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60',
                    'MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5',
                    'MC_OLD_18>F>5','MC_OLD_M_Child<5','MC_OLD_F_Child<5',
                    'MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery',
                    'Date to insert','Date of Entery']].copy()

Whenever you 'create' a new dataframe from another in the following fashion:

每当您以以下方式从另一个数据帧“创建”一个新数据帧时:

new_df = old_df[list_of_columns_names]

new_dfwill have a truthy value in it's is_copyattribute. When you attempt to assign to it, pandas throws the SettingWithCopyWarning.

new_df将在其is_copy属性中具有真实值。当您尝试分配给它时,pandas 会抛出SettingWithCopyWarning.

new_df.iloc[0, 0] = 1  # Should throw an error

You can overcome this in several ways.

您可以通过多种方式克服这一点。

Option #1

选项1

new_df = old_df[list_of_columns_names].copy()

Option #2 (as @ayhan suggested in comments)

选项#2(如@ayhan 在评论中建议的那样)

new_df = old_df[list_of_columns_names]
new_df.is_copy = None

Option #3

选项#3

new_df = old_df.loc[:, list_of_columns_names]