python pandas数据帧，是按值传递还是按引用传递

Question

提问by nos

If I pass a dataframe to a function and modify it inside the function, is it pass-by-value or pass-by-reference?

如果我将数据帧传递给函数并在函数内部修改它，它是按值传递还是按引用传递？

I run the following code

我运行以下代码

a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
def letgo(df):
    df = df.drop('b',axis=1)
letgo(a)

the value of adoes not change after the function call. Does it mean it is pass-by-value?

a函数调用后的值不会改变。这是否意味着它是按值传递的？

I also tried the following

我也尝试了以下

xx = np.array([[1,2], [3,4]])
def letgo2(x):
    x[1,1] = 100
def letgo3(x):
    x = np.array([[3,3],[3,3]])

It turns out letgo2()does change xxand letgo3()does not. Why is it like this?

事实证明letgo2()确实会改变xx，letgo3()也不会。为什么会这样？

Answer 1

回答by Matthias Fripp

The short answer is, Python always does pass-by-value, but every Python variable is actually a pointer to some object, so sometimes it looks like pass-by-reference.

简短的回答是，Python 总是按值传递，但每个 Python 变量实际上都是指向某个对象的指针，因此有时它看起来像按引用传递。

In Python every object is either mutable or non-mutable. e.g., lists, dicts, modules and Pandas data frames are mutable, and ints, strings and tuples are non-mutable. Mutable objects can be changed internally (e.g., add an element to a list), but non-mutable objects cannot.

在 Python 中，每个对象要么是可变的，要么是不可变的。例如，列表、字典、模块和 Pandas 数据帧是可变的，而整数、字符串和元组是不可变的。可变对象可以在内部更改（例如，向列表中添加元素），但非可变对象不能。

As I said at the start, you can think of every Python variable as a pointer to an object. When you pass a variable to a function, the variable (pointer) within the function is always a copy of the variable (pointer) that was passed in. So if you assign something new to the internal variable, all you are doing is changing the local variable to point to a different object. This doesn't alter (mutate) the original object that the variable pointed to, nor does it make the external variable point to the new object. At this point, the external variable still points to the original object, but the internal variable points to a new object.

正如我在开头所说的，您可以将每个 Python 变量视为指向对象的指针。当您将变量传递给函数时，函数内的变量（指针）始终是传入的变量（指针）的副本。因此，如果您为内部变量分配新的内容，您所做的就是更改局部变量指向不同的对象。这不会改变（变异）变量指向的原始对象，也不会使外部变量指向新对象。此时，外部变量仍然指向原始对象，但内部变量指向一个新对象。

If you want to alter the original object (only possible with mutable data types), you have to do something that alters the object withoutassigning a completely new value to the local variable. This is why letgo()and letgo3()leave the external item unaltered, but letgo2()alters it.

如果你想改变原始对象（只有可变数据类型才有可能），你必须做一些改变对象的事情，而不是为局部变量分配一个全新的值。这就是为什么letgo()和letgo3()离开外部项目不变，但letgo2()会改变它。

As @ursan pointed out, if letgo()used something like this instead, then it would alter (mutate) the original object that dfpoints to, which would change the value seen via the global avariable:

正如@ursan 指出的那样，如果letgo()使用类似这样的东西，那么它会改变（变异）df指向的原始对象，这将改变通过全局a变量看到的值：

def letgo(df):
    df.drop('b', axis=1, inplace=True)

a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo(a)  # will alter a

In some cases, you can completely hollow out the original variable and refill it with new data, without actually doing a direct assignment, e.g. this will alter the original object that vpoints to, which will change the data seen when you use vlater:

在某些情况下，您可以完全掏空原始变量并用新数据重新填充它，而无需实际进行直接赋值，例如，这将更改v指向的原始对象，这将更改您v以后使用时看到的数据：

def letgo3(x):
    x[:] = np.array([[3,3],[3,3]])

v = np.empty((2, 2))
letgo3(v)   # will alter v

Notice that I'm not assigning something directly to x; I'm assigning something to the entire internal range of x.

请注意，我没有直接将某些内容分配给x; 我正在为x.

If you absolutely must create a completely new object and make it visible externally (which is sometimes the case with pandas), you have two options. The 'clean' option would be just to return the new object, e.g.,

如果您绝对必须创建一个全新的对象并使其在外部可见（熊猫有时就是这种情况），您有两种选择。'clean' 选项只是返回新对象，例如，

def letgo(df):
    df = df.drop('b',axis=1)
    return df

a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a = letgo(a)

Another option would be to reach outside your function and directly alter a global variable. This changes ato point to a new object, and any function that refers to aafterward will see that new object:

另一种选择是到达您的函数之外并直接更改全局变量。这将更a改为指向一个新对象，a之后引用的任何函数都将看到该新对象：

def letgo():
    global a
    a = a.drop('b',axis=1)

a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo()   # will alter a!

Directly altering global variables is usually a bad idea, because anyone who reads your code will have a hard time figuring out how agot changed. (I generally use global variables for shared parameters used by many functions in a script, but I don't let them alter those global variables.)

直接更改全局变量通常是一个坏主意，因为任何阅读您代码的人都很难弄清楚是如何a更改的。（我通常将全局变量用于脚本中许多函数使用的共享参数，但我不会让它们更改这些全局变量。）

Answer 2

回答by Mike Graham

The question isn't PBV vs. PBR. These names only cause confusion in a language like Python; they were invented for languages that work like C or like Fortran (as the quintessential PBV and PBR languages). It is true, but not enlightening, that Python always passes by value. The question here is whether the value itself is mutated or whether you get a new value. Pandas usually errs on the side of the latter.

问题不是 PBV 与 PBR。这些名称只会在 Python 这样的语言中引起混淆；它们是为像 C 或 Fortran 一样工作的语言（作为典型的 PBV 和 PBR 语言）而发明的。Python 总是按值传递，这是事实，但并不具有启发性。这里的问题是值本身是否发生了变异，或者您是否获得了一个新值。Pandas 通常会站在后者一边。

http://nedbatchelder.com/text/names.htmlexplains very well what Python's system of names is.

http://nedbatchelder.com/text/names.html很好地解释了 Python 的名称系统是什么。

Answer 3

回答by ursan

To add to @Mike Graham's answer, who pointed to a very good read:

添加到@Mike Graham 的回答中，他指出了一篇非常好的读物：

In your case, what is important to remember is the difference between namesand values. a, df, xx, x, are all names, but they refer to the same or different valuesat different points of your examples:

在您的情况下，重要的是要记住names和values之间的区别。a, df, xx, x, 都是名称，但它们在示例的不同点引用相同或不同的值：

In the first example, letgorebindsdfto another value, because df.dropreturns a new DataFrameunless you set the argument inplace = True(see doc). That means that the name df(local to the letgofunction), which was referring to the value of a, is now referring to a new value, here the df.dropreturn value. The value ais referring to still exists and hasn't changed.
In the second example, letgo2mutatesx, without rebinding it, which is why xxis modified by letgo2. Unlike the previous example, here the local name xalways refers to the value the name xxis referring to, and changes that value in place, which is why the value xxis referring to has changed.
In the third example, letgo3rebindsxto a new np.array. That causes the name x, local to letgo3and previously referring to the value of xx, to now refer to another value, the new np.array. The value xxis referring to hasn't changed.

在第一个示例中，letgo重新绑定df到另一个值，因为除非您设置参数（请参阅 doc），否则df.drop返回一个新值。这意味着名称（函数的本地名称），它指的是的值，现在指的是一个新值，这里是返回值。所指的值仍然存在并且没有改变。DataFrameinplace = Truedfletgoadf.dropa
在第二个例子中，letgo2mutatesx，没有重新绑定它，这就是为什么xx被letgo2. 与前面的示例不同，这里的本地名称x始终引用名称xx所引用的值，并在适当位置更改该值，这就是xx所引用的值已更改的原因。
在第三个示例中，letgo3重新绑定x到一个新的np.array. 这会导致 name x、 local toletgo3和以前引用的值xx，现在引用另一个值 new np.array。xx所指的值没有改变。

Answer 4

回答by dstromberg

Python is neither pass by value nor pass by reference. It is pass by assignment.

Python 既不是按值传递，也不是按引用传递。它是通过赋值传递的。

Supporting reference, the Python FAQ: https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference

支持参考，Python FAQ：https: //docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference

IOW:

爱荷华州：

If you pass an immutable value, changes to it do not change its value in the caller - because you are rebinding the name to a new object.
If you pass a mutable value, changes made in the called function, also change the value in the caller, so long as you do not rebind that name to a new object. If you reassign the variable, creating a new object, that change and subsequent changes to the name are not seen in the caller.

如果你传递一个不可变的值，对它的更改不会改变它在调用者中的值——因为你正在将名称重新绑定到一个新对象。
如果传递可变值，则在被调用函数中所做的更改也会更改调用者中的值，只要您不将该名称重新绑定到新对象即可。如果您重新分配变量，创建一个新对象，则调用者不会看到该名称的更改和随后的更改。

So if you pass a list, and change its 0th value, that change is seen in both the called and the caller. But if you reassign the list with a new list, this change is lost. But if you slice the list and replace thatwith a new list, that change is seen in both the called and the caller.

因此，如果您传递一个列表，并更改其第 0 个值，则在被调用者和调用者中都会看到该更改。但是，如果您使用新列表重新分配列表，则此更改将丢失。但是，如果您将列表切片并用新列表替换它，则在被调用者和调用者中都会看到这种变化。

EG:

例如：

def change_it(list_):
    # This change would be seen in the caller if we left it alone
    list_[0] = 28

    # This change is also seen in the caller, and replaces the above
    # change
    list_[:] = [1, 2]

    # This change is not seen in the caller.
    # If this were pass by reference, this change too would be seen in
    # caller.
    list_ = [3, 4]

thing = [10, 20]
change_it(thing)
# here, thing is [1, 2]

If you're a C fan, you can think of this as passing a pointer by value - not a pointer to a pointer to a value, just a pointer to a value.

如果您是 C 迷，您可以将其视为按值传递指针 - 不是指向值的指针的指针，而是指向值的指针。

HTH.

哈。

Answer 5

回答by Israel Unterman

Here is the doc for drop:

这是 drop 的文档：

Return new object with labels in requested axis removed.

返回已删除请求轴中标签的新对象。

So a new dataframe is created. The original has not changed.

因此创建了一个新的数据框。原作没变。

But as for all objects in python, the data frame is passed to the function by reference.

但是对于python中的所有对象，数据帧都是通过引用传递给函数的。

Answer 6

回答by zosan

you need to make 'a' global at the start of the function otherwise it is a local variable and does not change the 'a' in the main code.

您需要在函数开始时使 'a' 成为全局变量，否则它是一个局部变量并且不会更改主代码中的 'a'。

python pandas数据帧，是按值传递还是按引用传递

提问by nos

回答by Matthias Fripp

回答by Mike Graham

回答by ursan

回答by dstromberg

回答by Israel Unterman

回答by zosan

相关推荐

最近更新

标签

python pandas数据帧，是按值传递还是按引用传递

提问by nos

回答by Matthias Fripp

回答by Mike Graham

回答by ursan

回答by dstromberg

回答by Israel Unterman

回答by zosan

相关推荐

Python：字符串替换索引

Python 迭代pyspark数据框列

Python NameError: 名称 'urllib' 未定义“

Python NameError: name '[string]' 未定义

相关推荐

最近更新

标签