pandas 熊猫内存错误

Question

提问by user308827

I have a csv file with ~50,000 rows and 300 columns. Performing the following operation is causing a memory error in Pandas (python):

我有一个大约 50,000 行和 300 列的 csv 文件。在 Pandas (python) 中执行以下操作会导致内存错误：

merged_df.stack(0).reset_index(1)

The data frame looks like:

数据框如下所示：

GRID_WISE_MW1   Col0    Col1    Col2 .... Col300
7228260         1444    1819    2042
7228261         1444    1819    2042

I am using latest pandas (0.13.1) and the bug does not occur with dataframes with fewer rows (~2,000)

我正在使用最新的 Pandas (0.13.1) 并且该错误不会发生在行数较少的数据帧中 (~2,000)

thanks!

谢谢！

Answer 1

回答by Jeff

So it takes on my 64-bit linux (32GB) memory, a little less than 2GB.

所以它占用了我的 64 位 linux (32GB) 内存，略小于 2GB。

In [5]: def f():
       df = DataFrame(np.random.randn(50000,300))
       df.stack().reset_index(1)


In [6]: %memit f()
maximum of 1: 1791.054688 MB per loop

Since you didn't specify. This won't work on 32-bit at all (as you can't usually allocate a 2GB contiguous block), but should work if you have reasonable swap / memory.

因为你没有指定。这根本不适用于 32 位（因为您通常不能分配 2GB 连续块），但如果您有合理的交换/内存，应该可以工作。

Answer 2

回答by Simbarashe Timothy Motsi

As an alternative approach you can use the library "dask"
e.g:

作为替代方法，您可以使用库“dask”，
例如：

# Dataframes implement the Pandas API
import dask.dataframe as dd`<br>
df = dd.read_csv('s3://.../2018-*-*.csv')

pandas 熊猫内存错误

提问by user308827

回答by Jeff

回答by Simbarashe Timothy Motsi

相关推荐

最近更新

标签

pandas 熊猫内存错误

提问by user308827

回答by Jeff

回答by Simbarashe Timothy Motsi

相关推荐

将函数应用于 MultiIndex pandas.DataFrame 列

将行附加到 Pandas DataFrame 添加 0 列

使用 Pandas 和/或 Numpy 进行读/写操作的最快文件格式

Pandas scattermatrix 中的类标签

相关推荐

最近更新

标签