Python 和 urllib

Question

提问by djq

I'm trying to download a zip file ("tl_2008_01001_edges.zip") from an ftp censussite using urllib. What form is the zip file in when I get it and how do I save it?

我正在尝试使用 urllib从 ftp人口普查站点下载 zip 文件（“tl_2008_01001_edges.zip”）。当我得到 zip 文件时，它是什么形式的，我该如何保存它？

I'm fairly new to Python and don't understand how urllib works.

我对 Python 相当陌生，不了解 urllib 的工作原理。

This is my attempt:

这是我的尝试：

import urllib, sys

zip_file = urllib.urlretrieve("ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/Autauga_County/", "tl_2008_01001_edges.zip")

If I know the list of ftp folders (or counties in this case), can I run through the ftp sitelist using the glob function?

如果我知道 ftp 文件夹列表（或在这种情况下是县），我可以使用 glob 函数浏览 ftp站点列表吗？

Thanks.

谢谢。

Answer 1

回答by gimel

Use urllib2.urlopen()for the zip file data anddirectory listing.

使用urllib2.urlopen()的zip文件数据和目录列表。

To process zip files with the zipfilemodule, you can write them to a disk file which is then passed to the zipfile.ZipFileconstructor. Retrieving the data is straightforward using read()on the file-like object returned by urllib2.urlopen().

要使用zipfile模块处理 zip 文件，您可以将它们写入磁盘文件，然后将其传递给zipfile.ZipFile构造函数。使用read()返回的类文件对象检索数据很简单urllib2.urlopen()。

Fetching directories:

获取目录：

>>> files = urllib2.urlopen('ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/').read().splitlines()
>>> for l in files[:4]: print l
... 
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01001_Autauga_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01003_Baldwin_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01005_Barbour_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01007_Bibb_County
>>>

Or, splitting for directory names:

或者，拆分目录名称：

>>> for l in files[:4]: print l.split()[-1]
... 
01001_Autauga_County
01003_Baldwin_County
01005_Barbour_County
01007_Bibb_County

Answer 2

回答by ghostdog74

import os,urllib2
out=os.path.join("/tmp","test.zip")
url="ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/01001_Autauga_County/tl_2008_01001_edges.zip"
page=urllib2.urlopen(url)
open(out,"wb").write(page.read())

Answer 3

回答by Alex Martelli

Per the docs, urlretrieveputs the file to disk and returns a tuple (filename, headers). So the file is already saved when urlretrievereturns.

根据文档，urlretrieve将文件放入磁盘并返回一个元组(filename, headers)。所以文件在urlretrieve返回时已经保存了。

You can open and read the ZIP file you've retrieved with the zipfilemodule of the standard library. globdoes not work inside zipfiles, only on normal filesystem directories.

您可以使用标准库的zipfile模块打开并阅读您检索到的 ZIP 文件。 glob不能在 zipfiles 中工作，只能在普通的文件系统目录中使用。

Python 和 urllib

提问by djq

回答by gimel

回答by ghostdog74

回答by Alex Martelli

相关推荐

最近更新

标签

Python 和 urllib

提问by djq

回答by gimel

回答by ghostdog74

回答by Alex Martelli

相关推荐

python `if __name__ == '__main__'` 等价于 Ruby

python 如何从 Django 中的 sql 模式生成数据模型？

Python、IMAP 和 GMail。将消息标记为 SEEN

python 共享需要激活 virtualenv 的脚本

相关推荐

最近更新

标签

python `if name == 'main'` 等价于 Ruby