使用 Pandas 读取大文本文件

Question

提问by marillion

I have been trying to read a few large text files (sizes around 1.4GB - 2GB) with Pandas, using the read_csvfunction, with no avail. Below are the versions I am using:

我一直在尝试使用 Pandas 读取一些大文本文件（大小约为 1.4GB - 2GB），但read_csv没有成功。以下是我正在使用的版本：

Python 2.7.6
Anaconda 1.9.2 (64-bit) (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)]
IPython 1.1.0
Pandas 0.13.1

蟒蛇 2.7.6
Anaconda 1.9.2（64 位）（默认，2013 年 11 月 11 日，10:49:15）[MSC v.1500 64 位 (AMD64)]
IPython 1.1.0
Pandas 0.13.1

I tried the following:

我尝试了以下方法：

df = pd.read_csv(data.txt')

and it crashed Ipython with a message: Kernel died, restarting.

它使 Ipython 崩溃并显示一条消息：Kernel died, restarting。

Then I tried using an iterator:

然后我尝试使用迭代器：

tp = pd.read_csv('data.txt', iterator = True, chunksize=1000)

again, I got the Kernel died, restartingerror.

再次，我得到了Kernel died, restarting错误。

Any ideas? Or any other way to read big text files?

有任何想法吗？或者任何其他方式来读取大文本文件？

Thank you!

谢谢！

Answer 1

回答by DarkCygnus

A solution for a similar question was given heresome time after the posting of this question. Basically, it suggests to read the file in chunksby doing the following:

在发布此问题一段时间后，此处给出了类似问题的解决方案。基本上，它建议chunks通过执行以下操作来读入文件：

chunksize = 10 ** 6  # number of rows per chunk
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

You should specify the chunksizeparameter accordingly to your machine's capabilities (that is, make sure it can process the chunk).

您应该chunksize根据您的机器的能力指定相应的参数（即确保它可以处理块）。

使用 Pandas 读取大文本文件

提问by marillion

回答by DarkCygnus

相关推荐

最近更新

标签

使用 Pandas 读取大文本文件

提问by marillion

回答by DarkCygnus

相关推荐

基于索引（时间序列）合并 Pandas 行

pandas 来自不同长度列的 Python 箱线图

从 Pandas 数据框中绘制和格式化 seaborn 图表

Python Pandas 日均值

相关推荐

最近更新

标签