pandas 通过引用传递pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39783570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Passing pandas DataFrame by reference
提问by labrynth
My question is regarding immutability of pandas DataFrame when it is passed by reference. Consider the following code:
我的问题是关于 Pandas DataFrame 在通过引用传递时的不变性。考虑以下代码:
import pandas as pd
def foo(df1, df2):
df1['B'] = 1
df1 = df1.join(df2['C'], how='inner')
return()
def main(argv = None):
# Create DataFrames.
df1 = pd.DataFrame(range(0,10,2), columns=['A'])
df2 = pd.DataFrame(range(1,11,2), columns=['C'])
foo(df1, df2) # Pass df1 and df2 by reference.
print df1
return(0)
if __name__ == '__main__':
status = main()
sys.exit(status)
The output is
输出是
A B
0 0 1
1 2 1
2 4 1
3 6 1
4 8 1
and not
并不是
A B C
0 0 1 1
1 2 1 3
2 4 1 5
3 6 1 7
4 8 1 9
In fact, if foo is defined as
事实上,如果 foo 被定义为
def foo(df1, df2):
df1 = df1.join(df2['C'], how='inner')
df1['B'] = 1
return()
(i.e. the "join" statement before the other statement) then the output is simply
(即另一个语句之前的“join”语句)那么输出就是
A
0 0
1 2
2 4
3 6
4 8
I'm intrigued as to why this is the case. Any insights would be appreciated.
我很好奇为什么会这样。任何见解将不胜感激。
回答by Jezzamon
The issue is because of this line:
问题是因为这一行:
df1 = df1.join(df2['C'], how='inner')
df1.join(df2['C'], how='inner')
returns a new dataframe. After this line, df1
no longer refers to the same dataframe as the argument, but a new one, because it's been reassigned to the new result. The first dataframe continues to exist, unmodified. This isn't really a pandas issue, just the general way python, and most other languages, work.
df1.join(df2['C'], how='inner')
返回一个新的数据帧。在这一行之后,df1
不再引用与参数相同的数据帧,而是一个新的数据帧,因为它已被重新分配给新结果。第一个数据帧继续存在,未修改。这不是真正的Pandas问题,只是 python 和大多数其他语言的一般工作方式。
Some pandas functions have an inplace
argument, which would do what you want, however the join operation doesn't. If you need to modify a dataframe, you'll have to return this new one instead and reassign it outside the function.
一些 Pandas 函数有一个inplace
参数,它可以做你想要的,但是连接操作没有。如果你需要修改一个数据框,你必须返回这个新的,并在函数之外重新分配它。
回答by Ami Tavory
Python doesn't have pass by value vs. pass by reference - there are just bindings from names to objects.
Python 没有按值传递和按引用传递——只有从名称到对象的绑定。
If you change your function to
如果您将函数更改为
def foo(df1, df2):
res = df1.join(df2['C'], how='inner')
res['B'] = 1
return res
Then df1
, df2
, in the function, are bound to the objects you sent. The result of the join
, which is a new object in this case, is bound to the name res
. You can manipulate it, and return it, without affecting any of the other objects or bindings.
然后df1
,df2
在函数中,绑定到您发送的对象。的结果,join
在这种情况下是一个新对象,绑定到名称res
。您可以操作它并返回它,而不会影响任何其他对象或绑定。
In your calling code, you could just write
在你的调用代码中,你可以写
print foo(df1, df2)