给定 wget 命令的 Python 等价物

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24346872/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:27:06  来源:igfitidea点击:

Python equivalent of a given wget command

pythonwget

提问by Soviero

I'm trying to create a Python function that does the same thing as this wget command:

我正在尝试创建一个与此 wget 命令执行相同操作的 Python 函数:

wget -c --read-timeout=5 --tries=0 "$URL"

-c- Continue from where you left off if the download is interrupted.

-c- 如果下载中断,则从上次中断的地方继续。

--read-timeout=5- If there is no new data coming in for over 5 seconds, give up and try again. Given -cthis mean it will try again from where it left off.

--read-timeout=5- 如果超过 5 秒没有新数据传入,则放弃并重试。鉴于-c这意味着它将从停止的地方再次尝试。

--tries=0- Retry forever.

--tries=0- 永远重试。

Those three arguments used in tandem results in a download that cannot fail.

串联使用的这三个参数导致下载不会失败。

I want to duplicate those features in my Python script, but I don't know where to begin...

我想在我的 Python 脚本中复制这些功能,但我不知道从哪里开始......

采纳答案by Eugene K

urllib.requestshould work. Just set it up in a while(not done) loop, check if a localfile already exists, if it does send a GET with a RANGE header, specifying how far you got in downloading the localfile. Be sure to use read() to append to the localfile until an error occurs.

urllib.request应该可以工作。只需在一段时间(未完成)循环中设置它,检查本地文件是否已经存在,如果它确实发送了带有 RANGE 标头的 GET,则指定您下载本地文件的距离。请务必使用 read() 附加到本地文件,直到发生错误。

This is also potentially a duplicate of Python urllib2 resume download doesn't work when network reconnects

这也可能是Python urllib2的副本,当网络重新连接时,恢复下载不起作用

回答by Pujan Srivastava

import urllib2
import time

max_attempts = 80
attempts = 0
sleeptime = 10 #in seconds, no reason to continuously try if network is down

#while true: #Possibly Dangerous
while attempts < max_attempts:
    time.sleep(sleeptime)
    try:
        response = urllib2.urlopen("http://example.com", timeout = 5)
        content = response.read()
        f = open( "local/index.html", 'w' )
        f.write( content )
        f.close()
        break
    except urllib2.URLError as e:
        attempts += 1
        print type(e)

回答by Blairg23

There is also a nice Python module named wgetthat is pretty easy to use. Found here.

还有一个很好的 Python 模块,它的名字wget很容易使用。在这里找到。

This demonstrates the simplicity of the design:

这证明了设计的简单性:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

Enjoy.

享受。

However, if wgetdoesn't work (I've had trouble with certain PDF files), try this solution.

但是,如果wget不起作用(我在处理某些 PDF 文件时遇到了问题),请尝试此解决方案

Edit:You can also use the outparameter to use a custom output directory instead of current working directory.

编辑:您还可以使用该out参数来使用自定义输出目录而不是当前工作目录。

>>> output_directory = <directory_name>
>>> filename = wget.download(url, out=output_directory)
>>> filename
'razorback.mp3'

回答by Will Charlton

I had to do something like this on a version of linux that didn't have the right options compiled into wget. This example is for downloading the memory analysis tool 'guppy'. I'm not sure if it's important or not, but I kept the target file's name the same as the url target name...

我不得不在一个没有将正确选项编译到 wget 的 linux 版本上做这样的事情。此示例用于下载内存分析工具“guppy”。我不确定它是否重要,但我将目标文件的名称与 url 目标名称保持相同...

Here's what I came up with:

这是我想出的:

python -c "import requests; r = requests.get('https://pypi.python.org/packages/source/g/guppy/guppy-0.1.10.tar.gz') ; open('guppy-0.1.10.tar.gz' , 'wb').write(r.content)"

That's the one-liner, here's it a little more readable:

那是单行,这里有一点可读性:

import requests
fname = 'guppy-0.1.10.tar.gz'
url = 'https://pypi.python.org/packages/source/g/guppy/' + fname
r = requests.get(url)
open(fname , 'wb').write(r.content)

This worked for downloading a tarball. I was able to extract the package and download it after downloading.

这适用于下载 tarball。我能够解压缩包并在下载后下载它。

EDIT:

编辑:

To address a question, here is an implementation with a progress bar printed to STDOUT. There is probably a more portable way to do this without the clintpackage, but this was tested on my machine and works fine:

为了解决一个问题,这里有一个打印到 STDOUT 的进度条的实现。可能有一种更便携的方法可以在没有clint包的情况下执行此操作,但是在我的机器上进行了测试并且工作正常:

#!/usr/bin/env python

from clint.textui import progress
import requests

fname = 'guppy-0.1.10.tar.gz'
url = 'https://pypi.python.org/packages/source/g/guppy/' + fname

r = requests.get(url, stream=True)
with open(fname, 'wb') as f:
    total_length = int(r.headers.get('content-length'))
    for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length/1024) + 1): 
        if chunk:
            f.write(chunk)
            f.flush()

回答by Te ENe Te

Let me Improve a example with threads in case you want download many files.

如果你想下载很多文件,让我用线程改进一个例子。

import math
import random
import threading

import requests
from clint.textui import progress

# You must define a proxy list
# I suggests https://free-proxy-list.net/
proxies = {
    0: {'http': 'http://34.208.47.183:80'},
    1: {'http': 'http://40.69.191.149:3128'},
    2: {'http': 'http://104.154.205.214:1080'},
    3: {'http': 'http://52.11.190.64:3128'}
}


# you must define the list for files do you want download
videos = [
    "https://i.stack.imgur.com/g2BHi.jpg",
    "https://i.stack.imgur.com/NURaP.jpg"
]

downloaderses = list()


def downloaders(video, selected_proxy):
    print("Downloading file named {} by proxy {}...".format(video, selected_proxy))
    r = requests.get(video, stream=True, proxies=selected_proxy)
    nombre_video = video.split("/")[3]
    with open(nombre_video, 'wb') as f:
        total_length = int(r.headers.get('content-length'))
        for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length / 1024) + 1):
            if chunk:
                f.write(chunk)
                f.flush()


for video in videos:
    selected_proxy = proxies[math.floor(random.random() * len(proxies))]
    t = threading.Thread(target=downloaders, args=(video, selected_proxy))
    downloaderses.append(t)

for _downloaders in downloaderses:
    _downloaders.start()

回答by pd shah

easy as py:

像py一样简单:

class Downloder():
    def download_manager(self, url, destination='Files/DownloderApp/', try_number="10", time_out="60"):
        #threading.Thread(target=self._wget_dl, args=(url, destination, try_number, time_out, log_file)).start()
        if self._wget_dl(url, destination, try_number, time_out, log_file) == 0:
            return True
        else:
            return False


    def _wget_dl(self,url, destination, try_number, time_out):
        import subprocess
        command=["wget", "-c", "-P", destination, "-t", try_number, "-T", time_out , url]
        try:
            download_state=subprocess.call(command)
        except Exception as e:
            print(e)
        #if download_state==0 => successfull download
        return download_state

回答by Yohan Obadia

A solution that I often find simpler and more robust is to simply execute a terminal command within python. In your case:

我经常发现一个更简单、更健壮的解决方案是在 python 中简单地执行一个终端命令。在你的情况下:

import os
url = 'https://www.someurl.com'
os.system(f"""wget -c --read-timeout=5 --tries=0 "{url}"""")

回答by Rajan saha Raju

TensorFlow makes life easier. file pathgives us the location of downloaded file.

TensorFlow 让生活更轻松。文件路径为我们提供了下载文件的位置。

import tensorflow as tf
tf.keras.utils.get_file(origin='https://storage.googleapis.com/tf-datasets/titanic/train.csv',
                                    fname='train.csv',
                                    untar=False, extract=False)

回答by Paul Denoyes

For Windowsand Python 3.x, my two cents contribution about renaming the file on download:

对于WindowsPython 3.x,我在下载时重命名文件的两分钱贡献:

  1. Install wgetmodule : pip install wget
  2. Use wget :
  1. 安装wget模块:pip install wget
  2. 使用 wget :
import wget
wget.download('Url', 'C:\PathToMyDownloadFolder\NewFileName.extension')

Truely working command line example :

真正有效的命令行示例:

python -c "import wget; wget.download(""https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.17.2.tar.xz"", ""C:\Users\TestName.TestExtension"")"

Note: 'C:\\PathToMyDownloadFolder\\NewFileName.extension' is not mandatory. By default, the file is not renamed, and the download folder is your local path.

注意:“C:\\PathToMyDownloadFolder\\NewFileName.extension”不是强制性的。默认情况下,文件不会重命名,下载文件夹是您的本地路径。

回答by Shital Shah

Here's the code adopted from the torchvision library:

这是从torchvision 库采用的代码:

import urllib

def download_url(url, root, filename=None):
    """Download a file from a url and place it in root.
    Args:
        url (str): URL to download file from
        root (str): Directory to place downloaded file in
        filename (str, optional): Name to save the file under. If None, use the basename of the URL
    """

    root = os.path.expanduser(root)
    if not filename:
        filename = os.path.basename(url)
    fpath = os.path.join(root, filename)

    os.makedirs(root, exist_ok=True)

    try:
        print('Downloading ' + url + ' to ' + fpath)
        urllib.request.urlretrieve(url, fpath)
    except (urllib.error.URLError, IOError) as e:
        if url[:5] == 'https':
            url = url.replace('https:', 'http:')
            print('Failed download. Trying https -> http instead.'
                    ' Downloading ' + url + ' to ' + fpath)
            urllib.request.urlretrieve(url, fpath)

If you are ok to take dependency on torchvision library then you also also simply do:

如果您可以依赖于 torchvision 库,那么您也只需执行以下操作:

from torchvision.datasets.utils import download_url
download_url('http://something.com/file.zip', '~/my_folder`)