Python 和 urllib

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2289768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:16:38  来源:igfitidea点击:

Python and urllib

pythonurllib2urllib

提问by djq

I'm trying to download a zip file ("tl_2008_01001_edges.zip") from an ftp censussite using urllib. What form is the zip file in when I get it and how do I save it?

我正在尝试使用 urllib从 ftp人口普查站点下载 zip 文件(“tl_2008_01001_edges.zip”)。当我得到 zip 文件时,它是什么形式的,我该如何保存它?

I'm fairly new to Python and don't understand how urllib works.

我对 Python 相当陌生,不了解 urllib 的工作原理。

This is my attempt:

这是我的尝试:

import urllib, sys

zip_file = urllib.urlretrieve("ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/Autauga_County/", "tl_2008_01001_edges.zip")

If I know the list of ftp folders (or counties in this case), can I run through the ftp sitelist using the glob function?

如果我知道 ftp 文件夹列表(或在这种情况下是县),我可以使用 glob 函数浏览 ftp站点列表吗?

Thanks.

谢谢。

回答by gimel

Use urllib2.urlopen()for the zip file data anddirectory listing.

使用urllib2.urlopen()的zip文件数据目录列表。

To process zip files with the zipfilemodule, you can write them to a disk file which is then passed to the zipfile.ZipFileconstructor. Retrieving the data is straightforward using read()on the file-like object returned by urllib2.urlopen().

要使用zipfile模块处理 zip 文件,您可以将它们写入磁盘文件,然后将其传递给zipfile.ZipFile构造函数。使用read()返回的类文件对象检索数据很简单urllib2.urlopen()

Fetching directories:

获取目录:

>>> files = urllib2.urlopen('ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/').read().splitlines()
>>> for l in files[:4]: print l
... 
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01001_Autauga_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01003_Baldwin_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01005_Barbour_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01007_Bibb_County
>>> 

Or, splitting for directory names:

或者,拆分目录名称:

>>> for l in files[:4]: print l.split()[-1]
... 
01001_Autauga_County
01003_Baldwin_County
01005_Barbour_County
01007_Bibb_County

回答by ghostdog74

import os,urllib2
out=os.path.join("/tmp","test.zip")
url="ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/01001_Autauga_County/tl_2008_01001_edges.zip"
page=urllib2.urlopen(url)
open(out,"wb").write(page.read())

回答by Alex Martelli

Per the docs, urlretrieveputs the file to disk and returns a tuple (filename, headers). So the file is already saved when urlretrievereturns.

根据文档urlretrieve将文件放入磁盘并返回一个元组(filename, headers)。所以文件在urlretrieve返回时已经保存了。

You can open and read the ZIP file you've retrieved with the zipfilemodule of the standard library. globdoes not work inside zipfiles, only on normal filesystem directories.

您可以使用标准库的zipfile模块打开并阅读您检索到的 ZIP 文件。 glob不能在 zipfiles 中工作,只能在普通的文件系统目录中使用。