Python 在 Pandas 中,我可以深度复制包含索引和列的 DataFrame 吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17591104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In pandas, can I deeply copy a DataFrame including its index and column?
提问by waitingkuo
First, I create a DataFrame
首先,我创建一个 DataFrame
In [61]: import pandas as pd
In [62]: df = pd.DataFrame([[1], [2], [3]])
Then, I deeply copy it by copy
然后,我深深地复制了它 copy
In [63]: df2 = df.copy(deep=True)
Now the DataFrame
are different.
现在DataFrame
不同了。
In [64]: id(df), id(df2)
Out[64]: (4385185040, 4385183312)
However, the index
are still the same.
但是,index
它们仍然相同。
In [65]: id(df.index), id(df2.index)
Out[65]: (4385175264, 4385175264)
Same thing happen in columns, is there any way that I can easily deeply copy it not only values but also index and columns?
同样的事情发生在列中,有什么方法可以让我不仅可以轻松地深度复制值,还可以深度复制索引和列?
回答by Andy Hayden
I wonder whether this is a bug in pandas... it's interesting because Index/MultiIndex (index and columns) are in some sense supposed to be immutable(however I think these should be copies).
我想知道这是否是 Pandas 中的错误……这很有趣,因为 Index/MultiIndex(索引和列)在某种意义上应该是不可变的(但我认为这些应该是副本)。
For now, it's easy to create your own method, and add it to DataFrame:
现在,创建自己的方法并将其添加到 DataFrame 很容易:
In [11]: def very_deep_copy(self):
return pd.DataFrame(self.values.copy(), self.index.copy(), self.columns.copy())
In [12]: pd.DataFrame.very_deep_copy = very_deep_copy
In [13]: df2 = df.very_deep_copy()
As you can see this will create new objects (and preserve names):
如您所见,这将创建新对象(并保留名称):
In [14]: id(df.columns)
Out[14]: 4370636624
In [15]: id(df2.columns)
Out[15]: 4372118776
回答by Sergey
Latest version of Pandas does not have this issue anymore
最新版本的 Pandas 没有这个问题了
import pandas as pd
df = pd.DataFrame([[1], [2], [3]])
df2 = df.copy(deep=True)
id(df), id(df2)
Out[3]: (136575472, 127792400)
id(df.index), id(df2.index)
Out[4]: (145820144, 127657008)