pandas 通过引用传递pandas DataFrame

Question

提问by labrynth

My question is regarding immutability of pandas DataFrame when it is passed by reference. Consider the following code:

我的问题是关于 Pandas DataFrame 在通过引用传递时的不变性。考虑以下代码：

import pandas as pd

def foo(df1, df2):

    df1['B'] = 1
    df1 = df1.join(df2['C'], how='inner')

    return()

def main(argv = None):

    # Create DataFrames. 
    df1 = pd.DataFrame(range(0,10,2), columns=['A'])
    df2 = pd.DataFrame(range(1,11,2), columns=['C'])

    foo(df1, df2)    # Pass df1 and df2 by reference.

    print df1

    return(0)

if __name__ == '__main__':
    status = main()
    sys.exit(status)

The output is

输出是

and not

并不是

In fact, if foo is defined as

事实上，如果 foo 被定义为

def foo(df1, df2):

    df1 = df1.join(df2['C'], how='inner')
    df1['B'] = 1

    return()

(i.e. the "join" statement before the other statement) then the output is simply

（即另一个语句之前的“join”语句）那么输出就是

I'm intrigued as to why this is the case. Any insights would be appreciated.

我很好奇为什么会这样。任何见解将不胜感激。

Answer 1

回答by Jezzamon

The issue is because of this line:

问题是因为这一行：

df1 = df1.join(df2['C'], how='inner')

df1.join(df2['C'], how='inner')returns a new dataframe. After this line, df1no longer refers to the same dataframe as the argument, but a new one, because it's been reassigned to the new result. The first dataframe continues to exist, unmodified. This isn't really a pandas issue, just the general way python, and most other languages, work.

df1.join(df2['C'], how='inner')返回一个新的数据帧。在这一行之后，df1不再引用与参数相同的数据帧，而是一个新的数据帧，因为它已被重新分配给新结果。第一个数据帧继续存在，未修改。这不是真正的Pandas问题，只是 python 和大多数其他语言的一般工作方式。

Some pandas functions have an inplaceargument, which would do what you want, however the join operation doesn't. If you need to modify a dataframe, you'll have to return this new one instead and reassign it outside the function.

一些 Pandas 函数有一个inplace参数，它可以做你想要的，但是连接操作没有。如果你需要修改一个数据框，你必须返回这个新的，并在函数之外重新分配它。

Answer 2

回答by Ami Tavory

Python doesn't have pass by value vs. pass by reference - there are just bindings from names to objects.

Python 没有按值传递和按引用传递——只有从名称到对象的绑定。

If you change your function to

如果您将函数更改为

def foo(df1, df2):

    res = df1.join(df2['C'], how='inner')
    res['B'] = 1

    return res

Then df1, df2, in the function, are bound to the objects you sent. The result of the join, which is a new object in this case, is bound to the name res. You can manipulate it, and return it, without affecting any of the other objects or bindings.

然后df1，df2在函数中，绑定到您发送的对象。的结果，join在这种情况下是一个新对象，绑定到名称res。您可以操作它并返回它，而不会影响任何其他对象或绑定。

In your calling code, you could just write

在你的调用代码中，你可以写

print foo(df1, df2)

pandas 通过引用传递pandas DataFrame

提问by labrynth

回答by Jezzamon

回答by Ami Tavory

相关推荐

最近更新

标签

pandas 通过引用传递pandas DataFrame

提问by labrynth

回答by Jezzamon

回答by Ami Tavory

相关推荐

pandas 根据另一列计算值的出现次数

用 unicode 将 Pandas DataFrame 写入 JSON

pandas 熊猫将列类型从列表转换为 np.array

pandas 如何将 numpy 数组分成更小的块/批次，然后遍历它们

相关推荐

最近更新

标签