Linux 通过 Python 使用 wget
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2467609/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using wget via Python
提问by CoreIs
How would I download files (video) with Python using wget and save them locally? There will be a bunch of files, so how do I know that one file is downloaded so as to automatically start downloding another one?
我将如何使用 wget 使用 Python 下载文件(视频)并将它们保存在本地?会有一堆文件,那么我怎么知道下载了一个文件以便自动开始下载另一个文件?
Thanks.
谢谢。
回答by Ignacio Vazquez-Abrams
Don't do this. Use either urllib2
or urlgrabber
instead.
不要这样做。使用urllib2
或urlgrabber
代替。
回答by Adam Rosenfield
If you use os.system()
to spawn a process for the wget
, it will block until wget
finishes the download (or quits with an error). So, just call os.system('wget blah')
in a loop until you've downloaded all of your files.
如果您用于os.system()
为 生成一个进程wget
,它将阻塞,直到wget
完成下载(或因错误而退出)。因此,只需os.system('wget blah')
循环调用,直到下载完所有文件。
Alternatively, you can use urllib2
or httplib
. You'll have to write a non-trivial amount code, but you'll get better performance, since you can reuse a single HTTP connection to download many files, as opposed to opening a new connection for each file.
或者,您可以使用urllib2
或httplib
。您将不得不编写大量代码,但您将获得更好的性能,因为您可以重用单个 HTTP 连接来下载许多文件,而不是为每个文件打开一个新连接。
回答by McJeff
No reason to use os.system. Avoid writing a shell script in Python and go with something like urllib.urlretrieve or an equivalent.
没有理由使用 os.system。避免在 Python 中编写 shell 脚本并使用类似 urllib.urlretrieve 或等效的东西。
Edit... to answer the second part of your question, you can set up a thread pool using the standard library Queue class. Since you're doing a lot of downloading, the GIL shouldn't be a problem. Generate a list of the URLs you wish to download and feed them to your work queue. It will handle pushing requests to worker threads.
编辑...要回答问题的第二部分,您可以使用标准库 Queue 类设置线程池。由于您要进行大量下载,因此 GIL 应该不是问题。生成您希望下载的 URL 列表并将它们提供给您的工作队列。它将处理推送到工作线程的请求。
I'm waiting for a database update to complete, so I put this together real quick.
我正在等待数据库更新完成,所以我很快就把它放在了一起。
#!/usr/bin/python
import sys
import threading
import urllib
from Queue import Queue
import logging
class Downloader(threading.Thread):
def __init__(self, queue):
super(Downloader, self).__init__()
self.queue = queue
def run(self):
while True:
download_url, save_as = queue.get()
# sentinal
if not download_url:
return
try:
urllib.urlretrieve(download_url, filename=save_as)
except Exception, e:
logging.warn("error downloading %s: %s" % (download_url, e))
if __name__ == '__main__':
queue = Queue()
threads = []
for i in xrange(5):
threads.append(Downloader(queue))
threads[-1].start()
for line in sys.stdin:
url = line.strip()
filename = url.split('/')[-1]
print "Download %s as %s" % (url, filename)
queue.put((url, filename))
# if we get here, stdin has gotten the ^D
print "Finishing current downloads"
for i in xrange(5):
queue.put((None, None))
回答by davr
No reason to use python. Avoid writing a shell script in Python and go with something like bash or an equivalent.
没有理由使用python。避免用 Python 编写 shell 脚本,而使用 bash 或类似的东西。
回答by BozoJoe
Install wget via pypi http://pypi.python.org/pypi/wget/0.3
通过 pypi http://pypi.python.org/pypi/wget/0.3安装 wget
pip install wget
then run, just as documented
然后运行,就像记录的那样
python -m wget <url>
回答by Mark Lakata
Short answer (simplified). To get one file
简答(简化)。获取一个文件
import urllib
urllib.urlretrieve("http://google.com/index.html", filename="local/index.html")
You can figure out how to loop that if necessary.
如有必要,您可以弄清楚如何循环。