使用 Pandas 读取包含在 zip 文件中的多个文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44575251/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
reading multiple files contained in a zip file with pandas
提问by johnnyb
I have multiple zip files containing different types of txt files. Like below:
我有多个包含不同类型 txt 文件的 zip 文件。像下面这样:
zip1
- file1.txt
- file2.txt
- file3.txt
How can I use pandas to read in each of those files without extracting them?
如何使用 Pandas 读取每个文件而不提取它们?
I know if they were 1 file per zip I could use the compression method with read_csv like below:
我知道如果它们是每个 zip 1 个文件,我可以使用 read_csv 的压缩方法,如下所示:
df = pd.read_csv(textfile.zip, compression='zip')
Any help on how to do this would be great.
关于如何做到这一点的任何帮助都会很棒。
回答by Stephen Rauch
You can pass ZipFile.open()
to pandas.read_csv()
to construct a pandas.DataFrame
from a csv-file packed into a multi-file zip
.
您可以传递ZipFile.open()
到pandas.read_csv()
以pandas.DataFrame
从打包成多文件 .csv 文件的 .csv 文件构造一个zip
。
Code:
代码:
pd.read_csv(zip_file.open('file3.txt'))
Example to read all .csv
into a dict:
将所有内容读.csv
入字典的示例:
from zipfile import ZipFile
zip_file = ZipFile('textfile.zip')
dfs = {text_file.filename: pd.read_csv(zip_file.open(text_file.filename))
for text_file in zip_file.infolist()
if text_file.filename.endswith('.csv')}
回答by Iain Dwyer
I had a similar problem with XML files awhile ago. The zipfile module can get you there.
不久前,我在处理 XML 文件时遇到了类似的问题。zipfile 模块可以让你到达那里。
from zipfile import ZipFile
z = ZipFile(yourfile)
text_files = z.infolist()
for text_file in text_files:
z.read(text_file.filename)
If you want to concatenate them into a pandas object then it might get a bit more complex, but that should get you started. Note that the read
method returns bytes, so you may have to handle that as well.
如果你想将它们连接成一个 pandas 对象,那么它可能会变得更复杂一些,但这应该会让你开始。请注意,该read
方法返回字节,因此您可能也必须处理它。