使用 Pandas 读取包含在 zip 文件中的多个文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44575251/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:47:52  来源:igfitidea点击:

reading multiple files contained in a zip file with pandas

pythonpython-3.xpandaszip

提问by johnnyb

I have multiple zip files containing different types of txt files. Like below:

我有多个包含不同类型 txt 文件的 zip 文件。像下面这样:

zip1 
  - file1.txt
  - file2.txt
  - file3.txt

How can I use pandas to read in each of those files without extracting them?

如何使用 Pandas 读取每个文件而不提取它们?

I know if they were 1 file per zip I could use the compression method with read_csv like below:

我知道如果它们是每个 zip 1 个文件,我可以使用 read_csv 的压缩方法,如下所示:

df = pd.read_csv(textfile.zip, compression='zip') 

Any help on how to do this would be great.

关于如何做到这一点的任何帮助都会很棒。

回答by Stephen Rauch

You can pass ZipFile.open()to pandas.read_csv()to construct a pandas.DataFramefrom a csv-file packed into a multi-file zip.

您可以传递ZipFile.open()pandas.read_csv()pandas.DataFrame从打包成多文件 .csv 文件的 .csv 文件构造一个zip

Code:

代码:

pd.read_csv(zip_file.open('file3.txt'))

Example to read all .csvinto a dict:

将所有内容读.csv入字典的示例:

from zipfile import ZipFile

zip_file = ZipFile('textfile.zip')
dfs = {text_file.filename: pd.read_csv(zip_file.open(text_file.filename))
       for text_file in zip_file.infolist()
       if text_file.filename.endswith('.csv')}

回答by Iain Dwyer

I had a similar problem with XML files awhile ago. The zipfile module can get you there.

不久前,我在处理 XML 文件时遇到了类似的问题。zipfile 模块可以让你到达那里。

from zipfile import ZipFile

z = ZipFile(yourfile)

text_files = z.infolist()

for text_file in text_files:
    z.read(text_file.filename)

If you want to concatenate them into a pandas object then it might get a bit more complex, but that should get you started. Note that the readmethod returns bytes, so you may have to handle that as well.

如果你想将它们连接成一个 pandas 对象,那么它可能会变得更复杂一些,但这应该会让你开始。请注意,该read方法返回字节,因此您可能也必须处理它。