如何使用 Python 解压缩 gz 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31028815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:20:58  来源:igfitidea点击:

How to unzip gz file using Python

pythonpython-2.7gzip

提问by Darkdeamon

I need to extract a gz file that I have downloaded from an FTP site to a local Windows file server. I have the variables set for the local path of the file, and I know it can be used by GZIP muddle.

我需要将我从 FTP 站点下载的 gz 文件解压缩到本地 Windows 文件服务器。我为文件的本地路径设置了变量,我知道 GZIP 可以使用它。

How can I do this? The file inside the GZ file is an XML file.

我怎样才能做到这一点?GZ 文件中的文件是一个 XML 文件。

回答by heinst

From the documentation:

从文档:

import gzip
f = gzip.open('file.txt.gz', 'rb')
file_content = f.read()
f.close()

回答by Matt

import gzip
import shutil
with gzip.open('file.txt.gz', 'rb') as f_in:
    with open('file.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

回答by perfecto25

from sh import gunzip

gunzip('/tmp/file1.gz')

回答by Feiyang.Chen

Maybe you want pass it to pandas also.

也许你也想把它传递给熊猫。

with gzip.open('features_train.csv.gz') as f:

    features_train = pd.read_csv(f)

features_train.head()

回答by whs2k

Not an exact answer because you're using xml data and there is currently no pd.read_xml()function (as of v0.23.4), but pandas (starting with v0.21.0) can uncompress the file for you! Thanks Wes!

不是一个确切的答案,因为您使用的是 xml 数据并且目前没有pd.read_xml()功能(从 v0.23.4 开始),但是 pandas(从 v0.21.0 开始)可以为您解压缩文件!谢谢韦斯!

import pandas as pd
import os
fn = '../data/file_to_load.json.gz'
print(os.path.isfile(fn))
df = pd.read_json(fn, lines=True, compression='gzip')
df.tail()

回答by Pedro J. Sola

If you are parsing the file after unzipping it, don't forget to use decode()method, is necessary when you open a file as binary.

如果您在解压后解析文件,请不要忘记使用decode()方法,当您以二进制文件打开文件时,这是必要的。

import gzip
with gzip.open(file.gz, 'rb') as f:
    for line in f:
        print(line.decode().strip())