pandas 通过引用传递pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39783570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:06:44  来源:igfitidea点击:

Passing pandas DataFrame by reference

pythonpandasdataframepass-by-referenceimmutability

提问by labrynth

My question is regarding immutability of pandas DataFrame when it is passed by reference. Consider the following code:

我的问题是关于 Pandas DataFrame 在通过引用传递时的不变性。考虑以下代码:

import pandas as pd

def foo(df1, df2):

    df1['B'] = 1
    df1 = df1.join(df2['C'], how='inner')

    return()

def main(argv = None):

    # Create DataFrames. 
    df1 = pd.DataFrame(range(0,10,2), columns=['A'])
    df2 = pd.DataFrame(range(1,11,2), columns=['C'])

    foo(df1, df2)    # Pass df1 and df2 by reference.

    print df1

    return(0)

if __name__ == '__main__':
    status = main()
    sys.exit(status)

The output is

输出是

   A  B  
0  0  1
1  2  1
2  4  1
3  6  1
4  8  1

and not

并不是

   A  B  C
0  0  1  1
1  2  1  3
2  4  1  5
3  6  1  7
4  8  1  9

In fact, if foo is defined as

事实上,如果 foo 被定义为

def foo(df1, df2):

    df1 = df1.join(df2['C'], how='inner')
    df1['B'] = 1

    return()

(i.e. the "join" statement before the other statement) then the output is simply

(即另一个语句之前的“join”语句)那么输出就是

   A    
0  0 
1  2 
2  4 
3  6 
4  8

I'm intrigued as to why this is the case. Any insights would be appreciated.

我很好奇为什么会这样。任何见解将不胜感激。

回答by Jezzamon

The issue is because of this line:

问题是因为这一行:

df1 = df1.join(df2['C'], how='inner')

df1.join(df2['C'], how='inner')returns a new dataframe. After this line, df1no longer refers to the same dataframe as the argument, but a new one, because it's been reassigned to the new result. The first dataframe continues to exist, unmodified. This isn't really a pandas issue, just the general way python, and most other languages, work.

df1.join(df2['C'], how='inner')返回一个新的数据帧。在这一行之后,df1不再引用与参数相同的数据帧,而是一个新的数据帧,因为它已被重新分配给新结果。第一个数据帧继续存在,未修改。这不是真正的Pandas问题,只是 python 和大多数其他语言的一般工作方式。

Some pandas functions have an inplaceargument, which would do what you want, however the join operation doesn't. If you need to modify a dataframe, you'll have to return this new one instead and reassign it outside the function.

一些 Pandas 函数有一个inplace参数,它可以做你想要的,但是连接操作没有。如果你需要修改一个数据框,你必须返回这个新的,并在函数之外重新分配它。

回答by Ami Tavory

Python doesn't have pass by value vs. pass by reference - there are just bindings from names to objects.

Python 没有按值传递和按引用传递——只有从名称到对象的绑定

If you change your function to

如果您将函数更改为

def foo(df1, df2):

    res = df1.join(df2['C'], how='inner')
    res['B'] = 1

    return res

Then df1, df2, in the function, are bound to the objects you sent. The result of the join, which is a new object in this case, is bound to the name res. You can manipulate it, and return it, without affecting any of the other objects or bindings.

然后df1df2在函数中,绑定到您发送的对象。的结果,join在这种情况下是一个新对象,绑定到名称res。您可以操作它并返回它,而不会影响任何其他对象或绑定。

In your calling code, you could just write

在你的调用代码中,你可以写

print foo(df1, df2)