使用带有 zip 压缩的 Pandas read_csv

Question

提问by itzy

I'm trying to use read_csvin pandas to read a zipped file from an FTP server. The zip file contains just one file, as is required.

我正在尝试read_csv在 Pandas 中使用从 FTP 服务器读取压缩文件。根据需要，zip 文件仅包含一个文件。

Here's my code:

这是我的代码：

pd.read_csv('ftp://ftp.fec.gov/FEC/2016/cn16.zip', compression='zip')

I get this error:

我收到此错误：

AttributeError: addinfourl instance has no attribute 'seek'

I get this error in both pandas 18.1 and 19.0. Am I missing something, or could this be a bug?

我在 Pandas 18.1 和 19.0 中都遇到了这个错误。我错过了什么，或者这可能是一个错误？

Answer 1

采纳答案by PyNoob

Although I'm not completely sure why you get the error, you can get around it by opening the url using urllib2and writing the data to an in-memory binary stream, as shown here. In addition, we have to specify the correct separator, or else we would receive another error.

虽然我不是完全确定为什么你的错误，你可以通过打开URL绕过它urllib2和数据写入到内存中的二进制流，如图所示这里。此外，我们必须指定正确的分隔符，否则我们会收到另一个错误。

import io
import urllib2 as urllib
import pandas as pd

r = urllib.urlopen('ftp://ftp.fec.gov/FEC/2016/cn16.zip')
df = pd.read_csv(io.BytesIO(r.read()), compression='zip', sep='|', header=None)

As far as the error itself, I think pandas is trying to use seek on the "zip file" prior to downloading the url contents (so it's not really a zip file), which would result in that error.

就错误本身而言，我认为大Pandas试图在下载 url 内容之前对“zip 文件”使用搜索（因此它不是真正的 zip 文件），这会导致该错误。

Answer 2

回答by Vlad Bezden

pandas now supports to load data straight from zip or other compressed files to DataFrame.

pandas 现在支持将数据直接从 zip 或其他压缩文件加载到 DataFrame。

compression : {‘infer', ‘gzip', ‘bz2', ‘zip', ‘xz', None}, default ‘infer'
For on-the-fly decompression of on-disk data. If ‘infer' and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz', ‘.bz2', ‘.zip', or ‘.xz' (otherwise no decompression). If using ‘zip', the ZIP file must contain only one data file to be read in. Set to None for no decompression.
New in version 0.18.1: support for ‘zip' and ‘xz' compression.

压缩：{'infer', 'gzip', 'bz2', 'zip', 'xz', None}，默认为 'infer'
用于磁盘数据的即时解压缩。如果 'infer' 和 filepath_or_buffer 类似于路径，则检测来自以下扩展名的压缩：'.gz'、'.bz2'、'.zip' 或 '.xz'（否则不解压缩）。如果使用“zip”，则 ZIP 文件必须只包含一个要读入的数据文件。设置为 None 表示不解压。
0.18.1 新版功能：支持“zip”和“xz”压缩。

import pandas as pd

df = pd.read_csv("path_to_file.zip")
# or
df = pd.read_csv("path_to_file.zip", compression="zip")

Answer 3

回答by Vinod

header = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/54.0.1',}
remotezip = requests.get(url, headers=header)
root = zipfile.ZipFile(io.BytesIO(remotezip.content))
for name in root.namelist():
            df = pd.read_csv(root.open(name))

Taken from my own blog post: Read zipped csv files in python pandas without downloading zipfile

摘自我自己的博客文章：在 python pandas 中读取压缩的 csv 文件而无需下载 zipfile

使用带有 zip 压缩的 Pandas read_csv

提问by itzy

采纳答案by PyNoob

回答by Vlad Bezden

回答by Vinod

相关推荐

最近更新

标签

使用带有 zip 压缩的 Pandas read_csv

提问by itzy

采纳答案by PyNoob

回答by Vlad Bezden

回答by Vinod

相关推荐

尝试迭代并加入 Pandas DF：AttributeError: 'Series' 对象没有属性 'join'

pandas 遍历 numpy 数组的最快方法是什么

将 Python Pandas DataFrame 写入 Word 文档

从列名列表中删除 Pandas 数据框中列的快速方法是什么

相关推荐

最近更新

标签