将 gz 文件直接加载到 Pandas 数据帧中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35101093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:35:43  来源:igfitidea点击:

Load directly gz file into pandas dataframe

pythonpandasgzip

提问by Marco Scarselli

I have this gz filefrom dati.istat.it: within it's a csv file (with different name) that i want load directly in pandas dataframe.

我有来自 dati.istat.it 的这个gz 文件:其中有一个 csv 文件(具有不同的名称),我想直接在 Pandas 数据框中加载它。

If i unzip with 7zip i easily load with this code pd.read_csv("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv", sep="|", engine = "python")

如果我用 7zip 解压缩,我很容易用这个代码加载 pd.read_csv("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv", sep="|", engine = "python")

how i can do it without unzip with 7zip frist?

我怎么能不用 7zip frist 解压缩呢?

thx a lot!

多谢!

采纳答案by jezrael

You can use library zipfile:

您可以使用库zipfile

import pandas as pd
import zipfile

z = zipfile.ZipFile('test/file.gz')
print pd.read_csv(z.open("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv"),
                  sep="|",
                  engine = "python")

Pandas supports only gzipand bz2in read_csv:

Pandas 仅支持gzipbz2in read_csv

compression: {‘gzip', ‘bz2', ‘infer', None}, default ‘infer'

For on-the-fly decompression of on-disk data. If ‘infer', then use gzip or bz2 if filepath_or_buffer is a string ending in ‘.gz' or ‘.bz2', respectively, and no decompression otherwise. Set to None for no decompression.

压缩:{'gzip', 'bz2', 'infer', None},默认为 'infer'

用于磁盘数据的即时解压缩。如果是'infer',那么如果filepath_or_buffer 是分别以'.gz' 或'.bz2' 结尾的字符串,则使用gzip 或bz2,否则不解压缩。设置为 None 不解压。