Pandas 数据帧枢轴 - 内存错误

Question

提问by Ulderique Demoitre

I have a dataframe dfwith the following structure:

我有一个df具有以下结构的数据框：

        val          newidx    Code
Idx                             
0       1.0      1220121127    706
1       1.0      1220121030    706
2       1.0      1620120122    565

It has 1000000 lines. In total we have 600 unique Codevalue and 200000 unique newidxvalues.

它有 1000000 行。我们总共有 600 个唯一Code值和 200000 个唯一newidx值。

If I perform the following operation

如果我执行以下操作

df.pivot_table(values='val', index='newidx', columns='Code', aggfunc='max')

I get a MemoryError. but this sounds strange as the size of the resulting dataframe should be sustainable: 200000x600.

我得到一个MemoryError. 但这听起来很奇怪，因为结果数据框的大小应该是可持续的：200000x600。

How much memory requires such operation? Is there a way to fix this memory error?

这样的操作需要多少内存？有没有办法解决这个内存错误？

Answer 1

回答by Kartik

Try to see if this fits in your memory:

试着看看这是否符合你的记忆：

df.groupby(['newidx', 'Code'])['val'].max().unstack()

pivot_tableis unfortunately very memory intensive as it may make multiple copies of data.

pivot_table不幸的是，它非常占用内存，因为它可能会制作多个数据副本。

If the groupbydoes not work, you will have to split your DataFrame into smaller pieces. Try not to assign multiple times. For example, if reading from csv:

如果groupby不起作用，您将不得不将您的 DataFrame 拆分成更小的部分。尽量不要分配多次。例如，如果从 csv 读取：

df = pd.read_csv('file.csv').groupby(['newidx', 'Code'])['val'].max().unstack()

avoids multiple assignments.

避免多次赋值。

Answer 2

回答by mplf

I've had a very similar problem when carrying out a merge between 4 dataframes recently.

我最近在执行 4 个数据帧之间的合并时遇到了一个非常相似的问题。

What worked for me was disabling the index during the groupby, then merging.

对我有用的是在 groupby 期间禁用索引，然后合并。

if @Kartiks answer doesn't work, try this before chunking the DataFrame.

如果@Kartiks 回答不起作用，请在对 DataFrame 进行分块之前尝试此操作。

df.groupby(['newidx', 'Code'], as_index=False)['val'].max().unstack()

Pandas 数据帧枢轴 - 内存错误

提问by Ulderique Demoitre

回答by Kartik

回答by mplf

相关推荐

最近更新

标签

Pandas 数据帧枢轴 - 内存错误

提问by Ulderique Demoitre

回答by Kartik

回答by mplf

相关推荐

pandas 使用 python 将文本到带有逗号分隔符的列

无法使用 Pandas plot() 函数组合条形图和折线图

pandas 创建列表时跳过熊猫数据框中的第一行

pandas 熊猫：在散点图中使用颜色

相关推荐

最近更新

标签