pandas 使用 python 读取大型 xlsx 文件的一部分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38623368/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading a portion of a large xlsx file with python
提问by Adel
I have a large .xlsx file with 1 million rows. I don't want to open the whole file in one go. I was wondering if I can read a chunk of the file, process it and then read the next chunk? (I prefer to use pandas for it.)
我有一个包含 100 万行的大型 .xlsx 文件。我不想一口气打开整个文件。我想知道我是否可以读取文件的一个块,处理它然后读取下一个块?(我更喜欢使用Pandas。)
回答by bpachev
Yes. Pandas supports chunked reading. You would go about reading an excel file like so.
是的。Pandas 支持分块阅读。你会像这样阅读一个excel文件。
import pandas as pd
xl = pd.ExcelFile("myfile.xlsx")
for sheet_name in xl.sheet_names:
reader = xl.parse(sheet_name, chunksize=1000):
for chunk in reader:
#parse chunk here
回答by MaxU
UPDATE:2019-09-05
更新:2019-09-05
The chunksize
parameter has been deprecated as it wasn't used by pd.read_excel()
, because of the nature of XLSX file format, which will be read up into memory as a whole during parsing.
由于 XLSX 文件格式的性质,该chunksize
参数已被弃用,因为它没有被 使用pd.read_excel()
,在解析过程中将作为一个整体读入内存。
There are more details about that in this great SO answer...
在这个很棒的 SO 答案中有更多细节......
OLD answer:
旧答案:
you can use read_excel()method:
您可以使用read_excel()方法:
chunksize = 10**5
for chunk in pd.read_excel(filename, chunksize=chunksize):
# process `chunk` DF
if your excel file has multiple sheets, take a look at bpachev'ssolution
如果您的 excel 文件有多个工作表,请查看bpachev 的解决方案