Linux 通过 Python 使用 wget

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2467609/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 19:52:02  来源:igfitidea点击:

Using wget via Python

pythonlinux

提问by CoreIs

How would I download files (video) with Python using wget and save them locally? There will be a bunch of files, so how do I know that one file is downloaded so as to automatically start downloding another one?

我将如何使用 wget 使用 Python 下载文件(视频)并将它们保存在本地?会有一堆文件,那么我怎么知道下载了一个文件以便自动开始下载另一个文件?

Thanks.

谢谢。

回答by Ignacio Vazquez-Abrams

Don't do this. Use either urllib2or urlgrabberinstead.

不要这样做。使用urllib2urlgrabber代替。

回答by Adam Rosenfield

If you use os.system()to spawn a process for the wget, it will block until wgetfinishes the download (or quits with an error). So, just call os.system('wget blah')in a loop until you've downloaded all of your files.

如果您用于os.system()为 生成一个进程wget,它将阻塞,直到wget完成下载(或因错误而退出)。因此,只需os.system('wget blah')循环调用,直到下载完所有文件。

Alternatively, you can use urllib2or httplib. You'll have to write a non-trivial amount code, but you'll get better performance, since you can reuse a single HTTP connection to download many files, as opposed to opening a new connection for each file.

或者,您可以使用urllib2httplib。您将不得不编写大量代码,但您将获得更好的性能,因为您可以重用单个 HTTP 连接来下载许多文件,而不是为每个文件打开一个新连接。

回答by McJeff

No reason to use os.system. Avoid writing a shell script in Python and go with something like urllib.urlretrieve or an equivalent.

没有理由使用 os.system。避免在 Python 中编写 shell 脚本并使用类似 urllib.urlretrieve 或等效的东西。

Edit... to answer the second part of your question, you can set up a thread pool using the standard library Queue class. Since you're doing a lot of downloading, the GIL shouldn't be a problem. Generate a list of the URLs you wish to download and feed them to your work queue. It will handle pushing requests to worker threads.

编辑...要回答问题的第二部分,您可以使用标准库 Queue 类设置线程池。由于您要进行大量下载,因此 GIL 应该不是问题。生成您希望下载的 URL 列表并将它们提供给您的工作队列。它将处理推送到工作线程的请求。

I'm waiting for a database update to complete, so I put this together real quick.

我正在等待数据库更新完成,所以我很快就把它放在了一起。


#!/usr/bin/python

import sys
import threading
import urllib
from Queue import Queue
import logging

class Downloader(threading.Thread):
    def __init__(self, queue):
        super(Downloader, self).__init__()
        self.queue = queue

    def run(self):
        while True:
            download_url, save_as = queue.get()
            # sentinal
            if not download_url:
                return
            try:
                urllib.urlretrieve(download_url, filename=save_as)
            except Exception, e:
                logging.warn("error downloading %s: %s" % (download_url, e))

if __name__ == '__main__':
    queue = Queue()
    threads = []
    for i in xrange(5):
        threads.append(Downloader(queue))
        threads[-1].start()

    for line in sys.stdin:
        url = line.strip()
        filename = url.split('/')[-1]
        print "Download %s as %s" % (url, filename)
        queue.put((url, filename))

    # if we get here, stdin has gotten the ^D
    print "Finishing current downloads"
    for i in xrange(5):
        queue.put((None, None))

回答by davr

No reason to use python. Avoid writing a shell script in Python and go with something like bash or an equivalent.

没有理由使用python。避免用 Python 编写 shell 脚本,而使用 bash 或类似的东西。

回答by BozoJoe

Install wget via pypi http://pypi.python.org/pypi/wget/0.3

通过 pypi http://pypi.python.org/pypi/wget/0.3安装 wget

pip install wget

then run, just as documented

然后运行,就像记录的那样

python -m wget <url>

回答by Mark Lakata

Short answer (simplified). To get one file

简答(简化)。获取一个文件

 import urllib
 urllib.urlretrieve("http://google.com/index.html", filename="local/index.html")

You can figure out how to loop that if necessary.

如有必要,您可以弄清楚如何循环。