Python 和 urllib
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2289768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python and urllib
提问by djq
I'm trying to download a zip file ("tl_2008_01001_edges.zip") from an ftp censussite using urllib. What form is the zip file in when I get it and how do I save it?
我正在尝试使用 urllib从 ftp人口普查站点下载 zip 文件(“tl_2008_01001_edges.zip”)。当我得到 zip 文件时,它是什么形式的,我该如何保存它?
I'm fairly new to Python and don't understand how urllib works.
我对 Python 相当陌生,不了解 urllib 的工作原理。
This is my attempt:
这是我的尝试:
import urllib, sys
zip_file = urllib.urlretrieve("ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/Autauga_County/", "tl_2008_01001_edges.zip")
If I know the list of ftp folders (or counties in this case), can I run through the ftp sitelist using the glob function?
如果我知道 ftp 文件夹列表(或在这种情况下是县),我可以使用 glob 函数浏览 ftp站点列表吗?
Thanks.
谢谢。
回答by gimel
Use urllib2.urlopen()
for the zip file data anddirectory listing.
使用urllib2.urlopen()
的zip文件数据和目录列表。
To process zip files with the zipfile
module, you can write them to a disk file which is then passed to the zipfile.ZipFile
constructor.
Retrieving the data is straightforward using read()
on the file-like object returned
by urllib2.urlopen()
.
要使用zipfile
模块处理 zip 文件,您可以将它们写入磁盘文件,然后将其传递给zipfile.ZipFile
构造函数。使用read()
返回的类文件对象检索数据很简单urllib2.urlopen()
。
Fetching directories:
获取目录:
>>> files = urllib2.urlopen('ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/').read().splitlines()
>>> for l in files[:4]: print l
...
drwxrwsr-x 2 0 4009 4096 Nov 26 2008 01001_Autauga_County
drwxrwsr-x 2 0 4009 4096 Nov 26 2008 01003_Baldwin_County
drwxrwsr-x 2 0 4009 4096 Nov 26 2008 01005_Barbour_County
drwxrwsr-x 2 0 4009 4096 Nov 26 2008 01007_Bibb_County
>>>
Or, splitting for directory names:
或者,拆分目录名称:
>>> for l in files[:4]: print l.split()[-1]
...
01001_Autauga_County
01003_Baldwin_County
01005_Barbour_County
01007_Bibb_County
回答by ghostdog74
import os,urllib2
out=os.path.join("/tmp","test.zip")
url="ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/01001_Autauga_County/tl_2008_01001_edges.zip"
page=urllib2.urlopen(url)
open(out,"wb").write(page.read())
回答by Alex Martelli
Per the docs, urlretrieve
puts the file to disk and returns a tuple (filename, headers)
. So the file is already saved when urlretrieve
returns.
根据文档,urlretrieve
将文件放入磁盘并返回一个元组(filename, headers)
。所以文件在urlretrieve
返回时已经保存了。
You can open and read the ZIP file you've retrieved with the zipfilemodule of the standard library. glob
does not work inside zipfiles, only on normal filesystem directories.
您可以使用标准库的zipfile模块打开并阅读您检索到的 ZIP 文件。 glob
不能在 zipfiles 中工作,只能在普通的文件系统目录中使用。