Python 如何部分读取一个巨大的 CSV 文件？

Question

提问by lserlohn

I have a very big csv file so that I can not read them all into the memory. I only want to read and process a few lines in it. So I am seeking a function in Pandas which could handle this task, which the basic python can handle this well:

我有一个非常大的 csv 文件，所以我无法将它们全部读入内存。我只想阅读和处理其中的几行。所以我在 Pandas 中寻找一个可以处理这个任务的函数，基本的 python 可以很好地处理这个任务：

with open('abc.csv') as f:
    line = f.readline()
    # pass until it reaches a particular line number....

However, if I do this in pandas, I always read the first line:

但是，如果我在熊猫中这样做，我总是阅读第一行：

datainput1 = pd.read_csv('matrix.txt',sep=',', header = None, nrows = 1 )
datainput2 = pd.read_csv('matrix.txt',sep=',', header = None, nrows = 1 )

I am looking for some easier way to handle this task in pandas. For example, if I want to read rows from 1000 to 2000. How can I do this quickly?

我正在寻找一些更简单的方法来处理熊猫中的这项任务。例如，如果我想读取从 1000 到 2000 的行。我怎样才能快速做到这一点？

I want to use pandas because I want to read data into the dataframe.

我想使用熊猫，因为我想将数据读入数据帧。

Answer 1

采纳答案by EdChum

Use chunksize:

使用chunksize：

for df in pd.read_csv('matrix.txt',sep=',', header = None, chunksize=1):
    #do something

To answer your second part do this:

要回答您的第二部分，请执行以下操作：

df = pd.read_csv('matrix.txt',sep=',', header = None, skiprows=1000, chunksize=1000)

This will skip the first 1000 rows and then only read the next 1000 rows giving you rows 1000-2000, unclear if you require the end points to be included or not but you can fiddle the numbers to get what you want.

这将跳过前 1000 行，然后只读取接下来的 1000 行，为您提供 1000-2000 行，不清楚您是否需要包括端点，但您可以摆弄数字以获得您想要的。

Answer 2

回答by petezurich

In addition to EdChums answer i find the nrowsargument useful which simply defines the number of rows you want to import. Thereby you don't get an iterator but rather can just import a part of the whole file of size nrows. It works with skiprowstoo.

除了 EdChums 的答案之外，我发现这个nrows参数很有用，它只是定义了要导入的行数。因此，您不会获得迭代器，而只能导入整个 size 文件的一部分nrows。它也适用skiprows。

df = pd.read_csv('matrix.txt',sep=',', header = None, skiprows= 1000, nrows=1000)

Python 如何部分读取一个巨大的 CSV 文件？

提问by lserlohn

采纳答案by EdChum

回答by petezurich

相关推荐

最近更新

标签

Python 如何部分读取一个巨大的 CSV 文件？

提问by lserlohn

采纳答案by EdChum

回答by petezurich

相关推荐

如何使用带有新行的python将列表保存为.csv文件？

Python 具有多索引列的 Pandas 数据框 - 合并级别

Python 如何在 Django 中设置时区？

Python 构建 3D Pandas DataFrame

相关推荐

最近更新

标签