pandas 使用 python 读取大型 xlsx 文件的一部分

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38623368/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:41:01  来源:igfitidea点击:

Reading a portion of a large xlsx file with python

pythonpandas

提问by Adel

I have a large .xlsx file with 1 million rows. I don't want to open the whole file in one go. I was wondering if I can read a chunk of the file, process it and then read the next chunk? (I prefer to use pandas for it.)

我有一个包含 100 万行的大型 .xlsx 文件。我不想一口气打开整个文件。我想知道我是否可以读取文件的一个块,处理它然后读取下一个块?(我更喜欢使用Pandas。)

回答by bpachev

Yes. Pandas supports chunked reading. You would go about reading an excel file like so.

是的。Pandas 支持分块阅读。你会像这样阅读一个excel文件。

import pandas as pd
xl = pd.ExcelFile("myfile.xlsx")
for sheet_name in xl.sheet_names:
  reader = xl.parse(sheet_name, chunksize=1000):
  for chunk in reader:
    #parse chunk here

回答by MaxU

UPDATE:2019-09-05

更新:2019-09-05

The chunksizeparameter has been deprecated as it wasn't used by pd.read_excel(), because of the nature of XLSX file format, which will be read up into memory as a whole during parsing.

由于 XLSX 文件格式的性质,该chunksize参数已被弃用,因为它没有被 使用pd.read_excel(),在解析过程中将作为一个整体读入内存。

There are more details about that in this great SO answer...

这个很棒的 SO 答案中有更多细节......



OLD answer:

旧答案:

you can use read_excel()method:

您可以使用read_excel()方法:

chunksize = 10**5
for chunk in pd.read_excel(filename, chunksize=chunksize):
    # process `chunk` DF

if your excel file has multiple sheets, take a look at bpachev'ssolution

如果您的 excel 文件有多个工作表,请查看bpachev 的解决方案