Python Pandas 中的 DataFrame.apply 改变原始数据帧和重复数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10844493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
DataFrame.apply in python pandas alters both original and duplicate DataFrames
提问by MikeGruz
I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate andthe original DataFrame.
我在更改重复的 Pandas DataFrame 时遇到了一些麻烦,并且没有将编辑同时应用于重复数据帧和原始数据帧。
Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:
这是一个例子。假设我从字典列表中创建了一个任意的 DataFrame:
In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}]
In [68]: d = DataFrame(d)
In [69]: d
Out[69]:
a b
0 3 5
1 1 1
Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:
然后我将 'd' 数据框分配给变量 'e' 并使用 apply 对列 'a' 应用一些任意数学:
In [70]: e = d
In [71]: e['a'] = e['a'].apply(lambda x: x + 1)
The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:
问题在于 apply 函数显然同时适用于重复的 DataFrame 'e' 和原始的 DataFrame 'd',我一生都无法弄清楚:
In [72]: e # duplicate DataFrame
Out[72]:
a b
0 4 5
1 2 1
In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied
Out[73]:
a b
0 4 5
1 2 1
I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.
我已经搜索了 pandas 文档和谷歌的原因,但无济于事。我完全不明白这里发生了什么。
I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']]), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.
我也尝试过使用逐元素运算(例如e['a'] = [i + 1 for i in e['a']])的数学运算,但问题仍然存在。pandas DataFrame 类型中是否有我不知道的怪癖?我很感激有人可能提供的任何见解。
回答by BrenBarn
This is not a pandas-specific issue. In Python, assignment never copies anything:
这不是熊猫特有的问题。在 Python 中,赋值永远不会复制任何东西:
>>> a = [1,2,3]
>>> b = a
>>> b[0] = 'WHOA!'
>>> a
['WHOA!', 2, 3]
If you want a new DataFrame, make a copy with e = d.copy().
如果您想要一个新的 DataFrame,请使用e = d.copy().
Edit: I should clarify that assignment to a bare namenever copies anything. Assignment to an item or attribute (e.g., a[1] = xor a.foo = bar) is converted into method calls under the hood and may do copying depending on what kind of object ais.
编辑:我应该澄清分配给一个裸名永远不会复制任何东西。对项目或属性(例如,a[1] = x或a.foo = bar)的赋值被转换为引擎盖下的方法调用,并且可以根据对象的类型进行复制a。

