在 Python 中使用 GZIP 模块

Question

提问by user3111358

I'm trying to use the Python GZIP module to simply uncompress several .gz files in a directory. Note that I do not want to read the files, only uncompress them. After searching this site for a while, I have this code segment, but it does not work:

我正在尝试使用 Python GZIP 模块来简单地解压缩目录中的几个 .gz 文件。请注意，我不想读取文件，只想解压缩它们。在这个网站上搜索了一段时间后，我有这个代码段，但它不起作用：

import gzip
import glob
import os
for file in glob.glob(PATH_TO_FILE + "/*.gz"):
    #print file
    if os.path.isdir(file) == False:
        shutil.copy(file, FILE_DIR)
        # uncompress the file
        inF = gzip.open(file, 'rb')
        s = inF.read()
        inF.close()

the .gz files are in the correct location, and I can print the full path + filename with the print command, but the GZIP module isn't getting executed properly. what am I missing?

.gz 文件位于正确的位置，我可以使用打印命令打印完整路径 + 文件名，但 GZIP 模块没有正确执行。我错过了什么？

Answer 1

回答by goncalopp

If you get no error, the gzip module probably is being executed properly, and the file is already getting decompressed.

如果没有错误，则 gzip 模块可能正在正确执行，并且文件已经被解压缩。

The precise definition of "decompressed" varies on context:

“解压缩”的确切定义因上下文而异：

I do not want to read the files, only uncompress them

我不想读取文件，只想解压缩它们

The gzipmodule doesn't work as a desktop archiving program like 7-zip - you can't "uncompress" a file without "reading" it. Note that "reading" (in programming) usually just means "storing (temporarily) in the computer RAM", not "opening the file in the GUI".

该gzip模块不能像 7-zip 那样作为桌面归档程序运行 - 您不能在不“读取”文件的情况下“解压缩”它。请注意，“读取”（在编程中）通常仅表示“（临时）存储在计算机 RAM 中”，而不是“在 GUI 中打开文件”。

What you probablymean by "uncompress" (as in a desktop archiving program) is more precisely described (in programming) as "read a in-memory stream/buffer from a compressed file, and write it to a new file (and possibly delete the compressed file afterwards)"

您可能所说的“解压缩”（如在桌面归档程序中）更准确地描述（在编程中）为“从压缩文件中读取内存流/缓冲区，并将其写入新文件（并可能删除之后的压缩文件）”

inF = gzip.open(file, 'rb')
s = inF.read()
inF.close()

With these lines, you're just reading the stream. If you expect a new "uncompressed" file to be created, you just need to write the buffer to a new file:

使用这些行，您只是在阅读流。如果您希望创建一个新的“未压缩”文件，您只需要将缓冲区写入一个新文件：

with open(out_filename, 'wb') as out_file:
    out_file.write(s)

If you're dealing with very large files (larger than the amount of your RAM), you'll need to adopt a different approach. But that is the topic for another question.

如果您要处理非常大的文件（大于 RAM 的数量），则需要采用不同的方法。但这是另一个问题的主题。

Answer 2

回答by Jan Spurny

You're decompressing file into svariable, and do nothing with it. You should stop searching stackoverflow and read at least python tutorial. Seriously.

您正在将文件解压缩为s变量，并且什么都不做。您应该停止搜索 stackoverflow 并至少阅读 python 教程。严重地。

Anyway, there's several thing wrong with your code:

无论如何，您的代码有几个问题：

you need is to STORE the unzipped data in sinto some file.
there's no need to copy the actual *.gzfiles. Because in your code, you're unpacking the original gzip file and not the copy.
you're using file, which is a reserved word, as a variable. This is not an error, just a very bad practice.

您需要将解压缩的数据存储s到某个文件中。
无需复制实际*.gz文件。因为在您的代码中，您正在解压原始 gzip 文件而不是副本。
您正在使用file，这是一个保留字，作为一个变量。这不是错误，只是一种非常糟糕的做法。

This should probably do what you wanted:

这可能应该做你想要的：

import gzip
import glob
import os
import os.path

for gzip_path in glob.glob(PATH_TO_FILE + "/*.gz"):
    if os.path.isdir(gzip_path) == False:
        inF = gzip.open(gzip_path, 'rb')
        # uncompress the gzip_path INTO THE 's' variable
        s = inF.read()
        inF.close()

        # get gzip filename (without directories)
        gzip_fname = os.path.basename(gzip_path)
        # get original filename (remove 3 characters from the end: ".gz")
        fname = gzip_fname[:-3]
        uncompressed_path = os.path.join(FILE_DIR, fname)

        # store uncompressed file data from 's' variable
        open(uncompressed_path, 'w').write(s)

Answer 3

回答by user3111358

I was able to resolve this issue by using the subprocess module:

我能够通过使用 subprocess 模块来解决这个问题：

for file in glob.glob(PATH_TO_FILE + "/*.gz"):
    if os.path.isdir(file) == False:
        shutil.copy(file, FILE_DIR)
        # uncompress the file
        subprocess.call(["gunzip", FILE_DIR + "/" + os.path.basename(file)])

Since my goal was to simply uncompress the archive, the above code accomplishes this. The archived files are located in a central location, and are copied to a working area, uncompressed, and used in a test case. the GZIP module was too complicated for what I was trying to accomplish.

由于我的目标是简单地解压缩存档，因此上面的代码实现了这一点。归档文件位于一个中央位置，并被复制到工作区、解压缩并在测试用例中使用。GZIP 模块对于我想要完成的任务来说太复杂了。

Thanks for everyone's help. It is much appreciated!

谢谢大家的帮助。非常感谢！

Answer 4

回答by Martin Thoma

You should use withto open files and, of course, store the result of reading the compressed file. See gzipdocumentation:

您应该使用with打开文件，当然，存储读取压缩文件的结果。请参阅gzip文档：

import gzip
import glob
import os
import os.path

for gzip_path in glob.glob("%s/*.gz" % PATH_TO_FILE):
    if not os.path.isdir(gzip_path):
        with gzip.open(gzip_path, 'rb') as in_file:
            s = in_file.read()

        # Now store the uncompressed data
        path_to_store = gzip_fname[:-3]  # remove the '.gz' from the filename

        # store uncompressed file data from 's' variable
        with open(path_to_store, 'w') as f:
            f.write(s)

Depending on what exactly you want to do, you might want to have a look at tarfileand its 'r:gz'option for opening files.

根据您究竟想要做什么，您可能需要查看tarfile其'r:gz'打开文件的选项。

Answer 5

回答by Dalupus

I think there is a much simpler solution than the others presented given the op only wanted to extract all the files in a directory:

我认为有一个比其他解决方案更简单的解决方案，因为操作员只想提取目录中的所有文件：

import glob
from setuptools import archive_util

for fn in glob.glob('*.gz'):
  archive_util.unpack_archive(fn, '.')

在 Python 中使用 GZIP 模块

提问by user3111358

回答by goncalopp

回答by Jan Spurny

回答by user3111358

回答by Martin Thoma

回答by Dalupus

相关推荐

最近更新

标签

在 Python 中使用 GZIP 模块

提问by user3111358

回答by goncalopp

回答by Jan Spurny

回答by user3111358

回答by Martin Thoma

回答by Dalupus

相关推荐

简单的 Python UDP 服务器：无法从本地主机以外的客户端接收数据包

Python 将列附加到 Pandas 数据框

为 HTML 网页运行 python 脚本

Python 从列表中打印特定项目

相关推荐

最近更新

标签