将 gz 文件直接加载到 Pandas 数据帧中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35101093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Load directly gz file into pandas dataframe
提问by Marco Scarselli
I have this gz filefrom dati.istat.it: within it's a csv file (with different name) that i want load directly in pandas dataframe.
我有来自 dati.istat.it 的这个gz 文件:其中有一个 csv 文件(具有不同的名称),我想直接在 Pandas 数据框中加载它。
If i unzip with 7zip i easily load with this code
pd.read_csv("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv", sep="|", engine = "python")
如果我用 7zip 解压缩,我很容易用这个代码加载
pd.read_csv("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv", sep="|", engine = "python")
how i can do it without unzip with 7zip frist?
我怎么能不用 7zip frist 解压缩呢?
thx a lot!
多谢!
采纳答案by jezrael
You can use library zipfile
:
您可以使用库zipfile
:
import pandas as pd
import zipfile
z = zipfile.ZipFile('test/file.gz')
print pd.read_csv(z.open("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv"),
sep="|",
engine = "python")
Pandas supports only gzip
and bz2
in read_csv
:
Pandas 仅支持gzip
和bz2
in read_csv
:
compression: {‘gzip', ‘bz2', ‘infer', None}, default ‘infer'
For on-the-fly decompression of on-disk data. If ‘infer', then use gzip or bz2 if filepath_or_buffer is a string ending in ‘.gz' or ‘.bz2', respectively, and no decompression otherwise. Set to None for no decompression.
压缩:{'gzip', 'bz2', 'infer', None},默认为 'infer'
用于磁盘数据的即时解压缩。如果是'infer',那么如果filepath_or_buffer 是分别以'.gz' 或'.bz2' 结尾的字符串,则使用gzip 或bz2,否则不解压缩。设置为 None 不解压。