pandas 熊猫内存错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23205005/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas memory error
提问by user308827
I have a csv file with ~50,000 rows and 300 columns. Performing the following operation is causing a memory error in Pandas (python):
我有一个大约 50,000 行和 300 列的 csv 文件。在 Pandas (python) 中执行以下操作会导致内存错误:
merged_df.stack(0).reset_index(1)
The data frame looks like:
数据框如下所示:
GRID_WISE_MW1 Col0 Col1 Col2 .... Col300
7228260 1444 1819 2042
7228261 1444 1819 2042
I am using latest pandas (0.13.1) and the bug does not occur with dataframes with fewer rows (~2,000)
我正在使用最新的 Pandas (0.13.1) 并且该错误不会发生在行数较少的数据帧中 (~2,000)
thanks!
谢谢!
回答by Jeff
So it takes on my 64-bit linux (32GB) memory, a little less than 2GB.
所以它占用了我的 64 位 linux (32GB) 内存,略小于 2GB。
In [5]: def f():
df = DataFrame(np.random.randn(50000,300))
df.stack().reset_index(1)
In [6]: %memit f()
maximum of 1: 1791.054688 MB per loop
Since you didn't specify. This won't work on 32-bit at all (as you can't usually allocate a 2GB contiguous block), but should work if you have reasonable swap / memory.
因为你没有指定。这根本不适用于 32 位(因为您通常不能分配 2GB 连续块),但如果您有合理的交换/内存,应该可以工作。
回答by Simbarashe Timothy Motsi
As an alternative approach you can use the library "dask"
e.g:
作为替代方法,您可以使用库“dask”,
例如:
# Dataframes implement the Pandas API
import dask.dataframe as dd`<br>
df = dd.read_csv('s3://.../2018-*-*.csv')

