python Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '='
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46327494/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '='
提问by dkrynicki
Could somebody explain to me a difference between
有人可以向我解释两者之间的区别吗
df2 = df1
df2 = df1.copy()
df3 = df1.copy(deep=False)
I have tried all options and did as follows:
我已经尝试了所有选项,并按如下方式操作:
df1 = pd.DataFrame([1,2,3,4,5])
df2 = df1
df3 = df1.copy()
df4 = df1.copy(deep=False)
df1 = pd.DataFrame([9,9,9])
and returned as follows:
并返回如下:
df1: [9,9,9]
df2: [1,2,3,4,5]
df3: [1,2,3,4,5]
df4: [1,2,3,4,5]
So, I observe no difference in the output between .copy()
and .copy(deep=False)
. Why?
因此,我观察到.copy()
和之间的输出没有差异.copy(deep=False)
。为什么?
I would expect one of the options '=', copy(), copy(deep=False) to return [9,9,9]
我希望选项 '=', copy(), copy(deep=False) 之一返回 [9,9,9]
What am I missing please?
请问我错过了什么?
采纳答案by Karthik V
If you see the object IDs of the various DataFrames you create, you can clearly see what is happening.
如果您看到您创建的各种 DataFrame 的对象 ID,您就可以清楚地看到发生了什么。
When you write df2 = df1
, you are creating a variable named df2
, and binding it with an object with id 4541269200
. When you write df1 = pd.DataFrame([9,9,9])
, you are creating a newobject with id 4541271120
and binding it to variable df1
, but the object with id 4541269200
which was previously bound to df1
continues to live. If there were no variables bound to that object, it will get garbage collected by Python.
当您编写时df2 = df1
,您正在创建一个名为 的变量df2
,并将其与具有 id 的对象绑定4541269200
。当您编写 时df1 = pd.DataFrame([9,9,9])
,您正在创建一个带有 id的新对象4541271120
并将其绑定到 variable df1
,但4541269200
先前绑定的带有 id 的对象df1
继续存在。如果没有绑定到该对象的变量,Python 将对其进行垃圾回收。
In[33]: import pandas as pd
In[34]: df1 = pd.DataFrame([1,2,3,4,5])
In[35]: id(df1)
Out[35]: 4541269200
In[36]: df2 = df1
In[37]: id(df2)
Out[37]: 4541269200 # Same id as df1
In[38]: df3 = df1.copy()
In[39]: id(df3)
Out[39]: 4541269584 # New object, new id.
In[40]: df4 = df1.copy(deep=False)
In[41]: id(df4)
Out[41]: 4541269072 # New object, new id.
In[42]: df1 = pd.DataFrame([9, 9, 9])
In[43]: id(df1)
Out[43]: 4541271120 # New object created and bound to name 'df1'.
In[44]: id(df2)
Out[44]: 4541269200 # Old object's id not impacted.
Edit: Added on 7/30/2018
编辑:添加于 7/30/2018
Deep copying doesn't work in pandasand the devs consider putting mutable objects inside a DataFrame as an antipattern. Consider the following:
深度复制在Pandas 中不起作用,开发人员考虑将可变对象放在 DataFrame 中作为反模式。考虑以下:
In[10]: arr1 = [1, 2, 3]
In[11]: arr2 = [1, 2, 3, 4]
In[12]: df1 = pd.DataFrame([[arr1], [arr2]], columns=['A'])
In[13]: df1.applymap(id)
Out[13]:
A
0 4515714832
1 4515734952
In[14]: df2 = df1.copy(deep=True)
In[15]: df2.applymap(id)
Out[15]:
A
0 4515714832
1 4515734952
In[16]: df2.loc[0, 'A'].append(55)
In[17]: df2
Out[17]:
A
0 [1, 2, 3, 55]
1 [1, 2, 3, 4]
In[18]: df1
Out[18]:
A
0 [1, 2, 3, 55]
1 [1, 2, 3, 4]
df2
, if it was a true deep copy should have had new ids for the lists contained within it. As a result, when you modify a list inside df2, it affects the list inside df1 as well, because they are the same objects.
df2
,如果它是一个真正的深拷贝,应该有包含在其中的列表的新 ID。因此,当您修改 df2 中的列表时,它也会影响 df1 中的列表,因为它们是相同的对象。
回答by Aman Agrawal
Deep copy creates new id's of every object it contains while normal copy only copies the elements from the parent and creates a new id for a variable to which it is copied to.
深拷贝为它包含的每个对象创建新的 id,而普通拷贝只从父元素复制元素,并为它复制到的变量创建一个新的 id。
The reason for none of df2
, df3
and df4
displaying [9,9,9]
is:
没有df2
,df3
和df4
显示的原因[9,9,9]
是:
In[33]: import pandas as pd
In[34]: df1 = pd.DataFrame([1,2,3,4,5])
In[35]: id(df1)
Out[35]: 4541269200
In[36]: df2 = df1
In[37]: id(df2)
Out[37]: 4541269200 # Same id as df1
In[38]: df3 = df1.copy()
In[39]: id(df3)
Out[39]: 4541269584 # New object, new id.
In[40]: df4 = df1.copy(deep=False)
In[41]: id(df4)
Out[41]: 4541269072 # New object, new id.
In[42]: df1 = pd.DataFrame([9, 9, 9])
In[43]: id(df1)
Out[43]: 4541271120 # New object created and bound to name 'df1'.
回答by flysoon
You need to modify df's elements individually. Try the following
您需要单独修改 df 的元素。尝试以下
df1 = pd.DataFrame([1,2,3,4,5])
df2 = df1
df3 = df1.copy()
df4 = df1.copy(deep=False)
df1.iloc[0,0] = 6
df2.iloc[1,0] = 7
df4.iloc[2,0] = 8
print(df1)
print(df2)
print(df3)
print(df4)