Python 下载并解压缩内存中的 gzip 文件？

Question

提问by OregonTrail

I would like to download a file using urllib and decompress the file in memory before saving.

我想使用 urllib 下载文件并在保存之前将文件解压缩到内存中。

This is what I have right now:

这就是我现在所拥有的：

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
outfile = open(outFilePath, 'w')
outfile.write(decompressedFile.read())

This ends up writing empty files. How can I achieve what I'm after?

这最终会写入空文件。我怎样才能实现我所追求的？

Updated Answer:

更新答案：

#! /usr/bin/env python2
import urllib2
import StringIO
import gzip

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"        
# check filename: it may change over time, due to new updates
filename = "man-pages-5.00.tar.gz" 
outFilePath = filename[:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile)

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

Answer 1

采纳答案by crayzeewulf

You need to seek to the beginning of compressedFileafter writing to it but before passing it to gzip.GzipFile(). Otherwise it will be read from the end by gzipmodule and will appear as an empty file to it. See below:

您需要compressedFile在写入之后但在将其传递给gzip.GzipFile(). 否则它将被gzip模块从末尾读取，并将显示为一个空文件。见下文：

#! /usr/bin/env python
import urllib2
import StringIO
import gzip

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-3.34.tar.gz"
outFilePath = "man-pages-3.34.tar"

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
#
# Set the file's current position to the beginning
# of the file so that gzip.GzipFile can read
# its contents from the top.
#
compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

Answer 2

回答by lyschoening

For those using Python 3, the equivalent answer is:

对于那些使用 Python 3 的人，等效的答案是：

import urllib.request
import io
import gzip

response = urllib.request.urlopen(FILE_URL)
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)

with open(OUTFILE_PATH, 'wb') as outfile:
    outfile.write(decompressed_file.read())

Answer 3

回答by Chih-Hsuan Yen

If you have Python 3.2 or above, life would be much easier:

如果你有 Python 3.2 或更高版本，生活会容易得多：

#!/usr/bin/env python3
import gzip
import urllib.request

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-4.03.tar.gz"
outFilePath = filename[:-3]

response = urllib.request.urlopen(baseURL + filename)
with open(outFilePath, 'wb') as outfile:
    outfile.write(gzip.decompress(response.read()))

For those who are interested in history, see https://bugs.python.org/issue3488and https://hg.python.org/cpython/rev/3fa0a9553402.

对于那些对历史感兴趣的人，请参阅https://bugs.python.org/issue3488和https://hg.python.org/cpython/rev/3fa0a9553402。

Answer 4

回答by BaiJiFeiLong

One line code to print the decompressed file content:

一行代码打印解压后的文件内容：

print gzip.GzipFile(fileobj=StringIO.StringIO(urllib2.urlopen(DOWNLOAD_LINK).read()), mode='rb').read()

Python 下载并解压缩内存中的 gzip 文件？

提问by OregonTrail

采纳答案by crayzeewulf

回答by lyschoening

回答by Chih-Hsuan Yen

回答by BaiJiFeiLong

相关推荐

最近更新

标签

Python 下载并解压缩内存中的 gzip 文件？

提问by OregonTrail

采纳答案by crayzeewulf

回答by lyschoening

回答by Chih-Hsuan Yen

回答by BaiJiFeiLong

相关推荐

Python pyodbc 插入 sql

Python 熊猫数据框选择多索引中的列

Python 提取模式匹配

Python 语法错误：非 ASCII 字符 '\xe2'

相关推荐

最近更新

标签