Pandas DataFrame 可变性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44993846/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame mutability
提问by user7786493
I am pretty new to Panda's Dataframe and it would be highly appreciated if someone can briefly discuss about the mutability of DataFrame to me with the following example:
我对 Panda 的 Dataframe 还很陌生,如果有人可以通过以下示例向我简要讨论 DataFrame 的可变性,我将不胜感激:
d1=pd.date_range('1/1/2016',periods=10,freq='w')
col1=['open','high','low','close']
list1=np.random.rand(10,4)
df1=pd.DataFrame(list1,d1,col1)
To my understanding, currently df1 is a reference to a df object.
据我了解,目前 df1 是对 df 对象的引用。
If I pass df1 or slicing of df1 (e.g. df1.iloc[2:3,1:2]
) as an input to a new df, (e.g. df2=pd.DataFrame(df1)
), does df2 return a new instance of dataframe or it is still referring to df1 that makes df1 exposed to df2?
如果我将 df1 或 df1 切片(例如df1.iloc[2:3,1:2]
)作为输入传递给新的 df,(例如df2=pd.DataFrame(df1)
),df2 是否返回数据帧的新实例,或者它仍然指使 df1 暴露给 df2 的 df1?
Also any other point that I should pay attention to regarding mutability of DataFrame will be very much appreciated.
此外,关于 DataFrame 的可变性,我应该注意的任何其他点都将不胜感激。
回答by John Zwinck
This:
这个:
df2 = pd.DataFrame(df1)
Constructs a new DataFrame. There is a copy
parameter whose default argument is False
. According to the documentation, it means:
构造一个新的 DataFrame。有一个copy
参数,其默认参数是False
。根据文档,这意味着:
> Copy data from inputs. Only affects DataFrame / 2d ndarray input
So data will be shared between df2
and df1
by default. If you want there to be no sharing, but rather a complete copy, do this:
所以数据会之间共享df2
和df1
默认。如果您希望没有共享,而是一个完整的副本,请执行以下操作:
df2 = pd.DataFrame(df1, copy=True)
Or more concisely and idiomatically:
或者更简洁地惯用语:
df2 = df1.copy()
If you do this:
如果你这样做:
df2 = df1.iloc[2:3,1:2].copy()
You will again get an independent copy. But if you do this:
您将再次获得独立副本。但如果你这样做:
df2 = pd.DataFrame(df1.iloc[2:3,1:2])
It will probably share the data, but this style is pretty unclear if you intend to modify df
, so I suggest not writing such code. Instead, if you want no copy, just say this:
它可能会共享数据,但是如果您打算修改df
,这种风格很不清楚,所以我建议不要编写这样的代码。相反,如果你不想复制,就这样说:
df2 = df1.iloc[2:3,1:2]
In summary: if you want a reference to existing data, do not call pd.DataFrame()
or any other method at all. If you want an independent copy, call .copy()
.
总之:如果您想引用现有数据,请根本不要调用pd.DataFrame()
或任何其他方法。如果您想要独立副本,请致电.copy()
。
回答by Saikat Kumar Dey
It will probably share the data, but this style is pretty unclear if you intend to modify df, so I suggest not writing such code. Instead, if you want no copy, just say this:
df2 = df1.iloc[2:3,1:2]
In summary: if you want a reference to existing data, do not call > pd.DataFrame() or any other method at all. If you want an independent copy, call .copy()
它可能会共享数据,但是如果您打算修改 df,这种风格还很不清楚,所以我建议不要编写这样的代码。相反,如果你不想复制,就这样说:
df2 = df1.iloc[2:3,1:2]
总结:如果您想要引用现有数据,请不要调用 > pd.DataFrame() 或任何其他方法。如果你想要一个独立的副本,调用 .copy()
I do not agree. Doing the above would still return a reference to the sliced section of the original DataFrame. So, if you make any changes to df2, it will reflect in df1.
我不同意。执行上述操作仍会返回对原始 DataFrame 切片部分的引用。因此,如果您对 df2 进行任何更改,它将反映在 df1 中。
Rather the .copy() should be used,
而应该使用 .copy() ,
df2 = df1.iloc[2:3,1:2].copy()
回答by Markus Dutschke
Great question, thanks. I ended up with playing around a bit after reading the other answers. So I want to share this with you.
很好的问题,谢谢。阅读其他答案后,我最终玩了一会儿。所以我想和你分享这个。
Here some code for playing around:
这里有一些代码可以玩:
import pandas as pd
import numpy as np
df=pd.DataFrame([[1,2,3],[4,5,6]])
print('start',df,sep='\n',end='\n\n')
def testAddCol(df):
df=pd.DataFrame(df, copy=True) #experiment in this line: df=df.copy(), df=df.iloc[:2,:2], df.iloc[:2,:2].copy(), nothing, ...
df['newCol']=11
df.iloc[0,0]=100
return df
df2=testAddCol(df)
print('df',df,sep='\n',end='\n\n')
print('df2',df2,sep='\n',end='\n\n')
output:
输出:
start
0 1 2
0 1 2 3
1 4 5 6
df
0 1 2
0 1 2 3
1 4 5 6
df2
0 1 2 newCol
0 100 2 3 11
1 4 5 6 11