Python Pandas 中的 DataFrame.apply 改变原始数据帧和重复数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10844493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:43:55  来源:igfitidea点击:

DataFrame.apply in python pandas alters both original and duplicate DataFrames

pythonpandas

提问by MikeGruz

I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate andthe original DataFrame.

我在更改重复的 Pandas DataFrame 时遇到了一些麻烦,并且没有将编辑同时应用于重复数据帧原始数据帧。

Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:

这是一个例子。假设我从字典列表中创建了一个任意的 DataFrame:

In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}]

In [68]: d = DataFrame(d)

In [69]: d

Out[69]: 
   a  b
0  3  5
1  1  1

Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:

然后我将 'd' 数据框分配给变量 'e' 并使用 apply 对列 'a' 应用一些任意数学:

In [70]: e = d

In [71]: e['a'] = e['a'].apply(lambda x: x + 1)

The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:

问题在于 apply 函数显然同时适用于重复的 DataFrame 'e' 和原始的 DataFrame 'd',我一生都无法弄清楚:

In [72]: e # duplicate DataFrame
Out[72]: 
   a  b
0  4  5
1  2  1

In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied
Out[73]:  
   a  b
0  4  5
1  2  1

I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.

我已经搜索了 pandas 文档和谷歌的原因,但无济于事。我完全不明白这里发生了什么。

I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']]), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.

我也尝试过使用逐元素运算(例如e['a'] = [i + 1 for i in e['a']])的数学运算,但问题仍然存在。pandas DataFrame 类型中是否有我不知道的怪癖?我很感激有人可能提供的任何见解。

回答by BrenBarn

This is not a pandas-specific issue. In Python, assignment never copies anything:

这不是熊猫特有的问题。在 Python 中,赋值永远不会复制任何东西:

>>> a = [1,2,3]
>>> b = a
>>> b[0] = 'WHOA!'
>>> a
['WHOA!', 2, 3]

If you want a new DataFrame, make a copy with e = d.copy().

如果您想要一个新的 DataFrame,请使用e = d.copy().

Edit: I should clarify that assignment to a bare namenever copies anything. Assignment to an item or attribute (e.g., a[1] = xor a.foo = bar) is converted into method calls under the hood and may do copying depending on what kind of object ais.

编辑:我应该澄清分配给一个裸名永远不会复制任何东西。对项目或属性(例如,a[1] = xa.foo = bar)的赋值被转换为引擎盖下的方法调用,并且可以根据对象的类型进行复制a