pandas 如何在 Python 数据帧中分块读取数据？

Question

提问by Geet

I want to read the file f in chunks to a dataframe. Here is part of a code that I used.

我想将文件 f 分块读取到数据帧中。这是我使用的代码的一部分。

for i in range(0, maxline, chunksize):
df = pandas.read_csv(f,sep=',', nrows=chunksize, skiprows=i)
df.to_sql(member, engine, if_exists='append',index= False, index_label=None, chunksize=chunksize)

I get the error:

我收到错误：

pandas.io.common.EmptyDataError: No columns to parse from file

pandas.io.common.EmptyDataError：没有要从文件解析的列

The code works only when the chunksize >= maxline (which is total lines in file f). However, in my case, the chunksize<=maxline.

该代码仅在 chunksize >= maxline（即文件 f 中的总行数）时有效。但是，就我而言，chunksize<=maxline。

Please advise the fix.

请建议修复。

Answer 1

回答by jezrael

I think it is better to use the parameter chunksizein read_csv. Also, use concatwith the parameter ignore_index, because of the need to avoid duplicates in index:

我认为这是更好地使用该参数chunksize在read_csv。此外，使用concat参数ignore_index，因为需要避免重复index：

chunksize = 5
TextFileReader = pd.read_csv(f, chunksize=chunksize)

df = pd.concat(TextFileReader, ignore_index=True)

See pandas docs.

请参阅Pandas文档。

pandas 如何在 Python 数据帧中分块读取数据？

提问by Geet

回答by jezrael

相关推荐

最近更新

标签

pandas 如何在 Python 数据帧中分块读取数据？

提问by Geet

回答by jezrael

相关推荐

Pandas DataFrame 按分类列排序，但按特定类排序

如何拆分“数字”以分隔 Pandas DataFrame 中的列

在 Pandas 索引对象的末尾添加一个值

pandas 计算系列的本地时间导数

相关推荐

最近更新

标签