pandas 读取 csv 文件的一部分

Question

提问by John Constantine

I have a really large csv file about 10GB. When ever I try to read in into iPython notebook using

我有一个大约 10GB 的非常大的 csv 文件。当我尝试使用

data = pd.read_csv("data.csv")

my laptop gets stuck. Is it possible to just read like 10,000 rows or 500 MB of a csv file.

我的笔记本电脑卡住了。是否可以只读取 10,000 行或 500 MB 的 csv 文件。

Answer 1

回答by miradulo

It is possible. You can create an iterator yielding chunks of your csv of a certain size at a time as a DataFrame by passing iterator=Truewith your desired chunksizeto read_csv.

有可能的。您可以创建一个迭代器，通过将iterator=True您想要chunksize的read_csv.

df_iter = pd.read_csv('data.csv', chunksize=10000, iterator=True)

for iter_num, chunk in enumerate(df_iter, 1):
    print(f'Processing iteration {iter_num}')
    # do things with chunk

Or more briefly

或者更简短

for chunk in pd.read_csv('data.csv', chunksize=10000):
    # do things with chunk

Alternatively if there was just a specific part of the csv you wanted to read, you could use the skiprowsand nrowsoptions to start at a particular line and subsequently read nrows, as the naming suggests.

或者，如果您只想读取 csv 的特定部分，您可以使用skiprows和nrows选项从特定行开始，然后n按照命名建议读取行。

Answer 2

回答by user3212593

Likely a memory issue. On read_csv you can set chunksize (where you can specify number of rows).

可能是内存问题。在 read_csv 上，您可以设置块大小（您可以在其中指定行数）。

Alternatively, if you don't need all the columns, you can change usecols on read_csv to import only the columns you need.

或者，如果您不需要所有列，您可以更改 read_csv 上的 usecols 以仅导入您需要的列。

pandas 读取 csv 文件的一部分

提问by John Constantine

回答by miradulo

回答by user3212593

相关推荐

最近更新

标签

pandas 读取 csv 文件的一部分

提问by John Constantine

回答by miradulo

回答by user3212593

相关推荐

pandas 从 DataFrame 的最后一行获取列表

尝试使用 Pandas 读取 csv 时出错

pandas “索引”对象在python中不可调用

pandas 迭代数据帧中的组

相关推荐

最近更新

标签