python Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '='

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46327494/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:29:40  来源:igfitidea点击:

python Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '='

pythonpandasdataframedeep-copy

提问by dkrynicki

Could somebody explain to me a difference between

有人可以向我解释两者之间的区别吗

df2 = df1

df2 = df1.copy()

df3 = df1.copy(deep=False)

I have tried all options and did as follows:

我已经尝试了所有选项,并按如下方式操作:

df1 = pd.DataFrame([1,2,3,4,5])
df2 = df1
df3 = df1.copy()
df4 = df1.copy(deep=False)
df1 = pd.DataFrame([9,9,9])

and returned as follows:

并返回如下:

df1: [9,9,9]
df2: [1,2,3,4,5]
df3: [1,2,3,4,5]
df4: [1,2,3,4,5]

So, I observe no difference in the output between .copy()and .copy(deep=False). Why?

因此,我观察到.copy()和之间的输出没有差异.copy(deep=False)。为什么?

I would expect one of the options '=', copy(), copy(deep=False) to return [9,9,9]

我希望选项 '=', copy(), copy(deep=False) 之一返回 [9,9,9]

What am I missing please?

请问我错过了什么?

采纳答案by Karthik V

If you see the object IDs of the various DataFrames you create, you can clearly see what is happening.

如果您看到您创建的各种 DataFrame 的对象 ID,您就可以清楚地看到发生了什么。

When you write df2 = df1, you are creating a variable named df2, and binding it with an object with id 4541269200. When you write df1 = pd.DataFrame([9,9,9]), you are creating a newobject with id 4541271120and binding it to variable df1, but the object with id 4541269200which was previously bound to df1continues to live. If there were no variables bound to that object, it will get garbage collected by Python.

当您编写时df2 = df1,您正在创建一个名为 的变量df2,并将其与具有 id 的对象绑定4541269200。当您编写 时df1 = pd.DataFrame([9,9,9]),您正在创建一个带有 id的对象4541271120并将其绑定到 variable df1,但4541269200先前绑定的带有 id 的对象df1继续存在。如果没有绑定到该对象的变量,Python 将对其进行垃圾回收。

In[33]: import pandas as pd
In[34]: df1 = pd.DataFrame([1,2,3,4,5])
In[35]: id(df1)
Out[35]: 4541269200

In[36]: df2 = df1
In[37]: id(df2)
Out[37]: 4541269200  # Same id as df1

In[38]: df3 = df1.copy()
In[39]: id(df3)
Out[39]: 4541269584  # New object, new id.

In[40]: df4 = df1.copy(deep=False)
In[41]: id(df4)
Out[41]: 4541269072  # New object, new id.

In[42]: df1 = pd.DataFrame([9, 9, 9])
In[43]: id(df1)
Out[43]: 4541271120  # New object created and bound to name 'df1'.

In[44]: id(df2)
Out[44]: 4541269200  # Old object's id not impacted.

Edit: Added on 7/30/2018

编辑:添加于 7/30/2018

Deep copying doesn't work in pandasand the devs consider putting mutable objects inside a DataFrame as an antipattern. Consider the following:

深度复制Pandas 中不起作用,开发人员考虑将可变对象放在 DataFrame 中作为反模式。考虑以下:

In[10]: arr1 = [1, 2, 3]
In[11]: arr2 = [1, 2, 3, 4]
In[12]: df1 = pd.DataFrame([[arr1], [arr2]], columns=['A'])
In[13]: df1.applymap(id)
Out[13]: 
            A
0  4515714832
1  4515734952

In[14]: df2 = df1.copy(deep=True)
In[15]: df2.applymap(id)
Out[15]: 
            A
0  4515714832
1  4515734952

In[16]: df2.loc[0, 'A'].append(55)
In[17]: df2
Out[17]: 
               A
0  [1, 2, 3, 55]
1   [1, 2, 3, 4]
In[18]: df1
Out[18]: 
               A
0  [1, 2, 3, 55]
1   [1, 2, 3, 4]

df2, if it was a true deep copy should have had new ids for the lists contained within it. As a result, when you modify a list inside df2, it affects the list inside df1 as well, because they are the same objects.

df2,如果它是一个真正的深拷贝,应该有包含在其中的列表的新 ID。因此,当您修改 df2 中的列表时,它也会影响 df1 中的列表,因为它们是相同的对象。

回答by Aman Agrawal

Deep copy creates new id's of every object it contains while normal copy only copies the elements from the parent and creates a new id for a variable to which it is copied to.

深拷贝为它包含的每个对象创建新的 id,而普通拷贝只从父元素复制元素,并为它复制到的变量创建一个新的 id。

The reason for none of df2, df3and df4displaying [9,9,9]is:

没有df2,df3df4显示的原因[9,9,9]是:

In[33]: import pandas as pd
In[34]: df1 = pd.DataFrame([1,2,3,4,5])
In[35]: id(df1)
Out[35]: 4541269200

In[36]: df2 = df1
In[37]: id(df2)
Out[37]: 4541269200  # Same id as df1

In[38]: df3 = df1.copy()
In[39]: id(df3)
Out[39]: 4541269584  # New object, new id.

In[40]: df4 = df1.copy(deep=False)
In[41]: id(df4)
Out[41]: 4541269072  # New object, new id.

In[42]: df1 = pd.DataFrame([9, 9, 9])
In[43]: id(df1)
Out[43]: 4541271120  # New object created and bound to name 'df1'.

回答by flysoon

You need to modify df's elements individually. Try the following

您需要单独修改 df 的元素。尝试以下

df1 = pd.DataFrame([1,2,3,4,5])
df2 = df1
df3 = df1.copy()
df4 = df1.copy(deep=False)

df1.iloc[0,0] = 6
df2.iloc[1,0] = 7
df4.iloc[2,0] = 8

print(df1)
print(df2)
print(df3)
print(df4)