将 gz 文件直接加载到 Pandas 数据帧中

Question

提问by Marco Scarselli

I have this gz filefrom dati.istat.it: within it's a csv file (with different name) that i want load directly in pandas dataframe.

我有来自 dati.istat.it 的这个gz 文件：其中有一个 csv 文件（具有不同的名称），我想直接在 Pandas 数据框中加载它。

If i unzip with 7zip i easily load with this code pd.read_csv("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv", sep="|", engine = "python")

如果我用 7zip 解压缩，我很容易用这个代码加载 pd.read_csv("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv", sep="|", engine = "python")

how i can do it without unzip with 7zip frist?

我怎么能不用 7zip frist 解压缩呢？

thx a lot!

多谢！

Answer 1

采纳答案by jezrael

You can use library zipfile:

您可以使用库zipfile：

import pandas as pd
import zipfile

z = zipfile.ZipFile('test/file.gz')
print pd.read_csv(z.open("DCCV_OCCUPATIT_Data+FootnotesLegend_175b2401-3654-4673-9e60-b300989088bb.csv"),
                  sep="|",
                  engine = "python")

Pandas supports only gzipand bz2in read_csv:

Pandas 仅支持gzip和bz2in read_csv：

compression: {‘gzip', ‘bz2', ‘infer', None}, default ‘infer'
For on-the-fly decompression of on-disk data. If ‘infer', then use gzip or bz2 if filepath_or_buffer is a string ending in ‘.gz' or ‘.bz2', respectively, and no decompression otherwise. Set to None for no decompression.

压缩：{'gzip', 'bz2', 'infer', None}，默认为 'infer'
用于磁盘数据的即时解压缩。如果是'infer'，那么如果filepath_or_buffer 是分别以'.gz' 或'.bz2' 结尾的字符串，则使用gzip 或bz2，否则不解压缩。设置为 None 不解压。

将 gz 文件直接加载到 Pandas 数据帧中

提问by Marco Scarselli

采纳答案by jezrael

相关推荐

最近更新

标签

将 gz 文件直接加载到 Pandas 数据帧中

提问by Marco Scarselli

采纳答案by jezrael

相关推荐

pandas 通过另一列熊猫找到列组的最大值

pandas 在sklearn中将文本列转换为数字

pandas 使用熊猫分组数据的堆积条形图

pandas 将数组添加到熊猫数据框中

相关推荐

最近更新

标签