pandas 使用 python 读取大型 xlsx 文件的一部分

Question

提问by Adel

I have a large .xlsx file with 1 million rows. I don't want to open the whole file in one go. I was wondering if I can read a chunk of the file, process it and then read the next chunk? (I prefer to use pandas for it.)

我有一个包含 100 万行的大型 .xlsx 文件。我不想一口气打开整个文件。我想知道我是否可以读取文件的一个块，处理它然后读取下一个块？（我更喜欢使用Pandas。）

Answer 1

回答by bpachev

Yes. Pandas supports chunked reading. You would go about reading an excel file like so.

是的。Pandas 支持分块阅读。你会像这样阅读一个excel文件。

import pandas as pd
xl = pd.ExcelFile("myfile.xlsx")
for sheet_name in xl.sheet_names:
  reader = xl.parse(sheet_name, chunksize=1000):
  for chunk in reader:
    #parse chunk here

Answer 2

回答by MaxU

UPDATE:2019-09-05

更新：2019-09-05

The chunksizeparameter has been deprecated as it wasn't used by pd.read_excel(), because of the nature of XLSX file format, which will be read up into memory as a whole during parsing.

由于 XLSX 文件格式的性质，该chunksize参数已被弃用，因为它没有被使用pd.read_excel()，在解析过程中将作为一个整体读入内存。

There are more details about that in this great SO answer...

在这个很棒的 SO 答案中有更多细节......

OLD answer:

旧答案：

you can use read_excel()method:

您可以使用read_excel()方法：

chunksize = 10**5
for chunk in pd.read_excel(filename, chunksize=chunksize):
    # process `chunk` DF

if your excel file has multiple sheets, take a look at bpachev'ssolution

如果您的 excel 文件有多个工作表，请查看bpachev 的解决方案

pandas 使用 python 读取大型 xlsx 文件的一部分

提问by Adel

回答by bpachev

回答by MaxU

相关推荐

最近更新

标签

pandas 使用 python 读取大型 xlsx 文件的一部分

提问by Adel

回答by bpachev

回答by MaxU

相关推荐

Pandas.read_excel：访问主目录

pandas python pandas数据帧索引，错误TypeError：输入必须是可迭代的，pandas版本可能错误

Pandas 查找列值在数据集中出现的次数

AttributeError: 模块“pandas.io.sql”没有属性“frame_query”

相关推荐

最近更新

标签