在python中下载基本的http文件并保存到磁盘?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19602931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:09:41  来源:igfitidea点击:

Basic http file downloading and saving to disk in python?

pythonfiledownloadsave

提问by arvindch

I'm new to Python and I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.

我是 Python 新手,我一直在浏览本网站上的问答,以回答我的问题。但是,我是初学者,我发现很难理解某些解决方案。我需要一个非常基本的解决方案。

Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?

有人可以向我解释“通过 http 下载文件”和“在 Windows 中将其保存到磁盘”的简单解决方案吗?

I'm not sure how to use shutil and os modules, either.

我也不知道如何使用 shutil 和 os 模块。

The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!

我要下载的文件小于 500 MB,是一个 .gz 存档文件。如果有人能解释如何提取存档并利用其中的文件,那就太好了!

Here's a partial solution, that I wrote from various answers combined:

这是一个部分解决方案,我从各种答案中组合而成:

import requests
import os
import shutil

global dump

def download_file():
    global dump
    url = "http://randomsite.com/file.gz"
    file = requests.get(url, stream=True)
    dump = file.raw

def save_file():
    global dump
    location = os.path.abspath("D:\folder\file.gz")
    with open("file.gz", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

Could someone point out errors (beginner level) and explain any easier methods to do this?

有人可以指出错误(初学者级别)并解释任何更简单的方法来做到这一点?

Thanks!

谢谢!

采纳答案by Blue Ice

A clean way to download a file is:

一种干净的下载文件的方法是:

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.

这将从网站下载文件并将其命名为file.gz. 这是我最喜欢的解决方案之一,来自通过 urllib 和 python 下载图片

This example uses the urlliblibrary, and it will directly retrieve the file form a source.

本示例使用该urllib库,它将直接从源中检索文件。

回答by dparpyani

As mentioned here:

正如这里提到的:

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT:If you still want to use requests, take a look at this questionor this one.

EDIT:如果您仍然想使用请求,请查看此问题问题

回答by Ali

I use wget.

我使用wget

Simple and good library if you want to example?

如果你想举例,简单而好的图书馆?

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget module support python 2 and python 3 versions

wget 模块支持 python 2 和 python 3 版本

回答by Ala

Another clean way to save the file is this:

另一种保存文件的干净方法是:

import csv
import urllib

urllib.retrieve("your url goes here" , "output.csv")

回答by Saurabh yadav

Four methods using wget, urllib and request.

四种方法使用 wget、urllib 和 request。

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds

testRequest - 20.236 秒内 4469882 次函数调用(4469842 次原始调用)

testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds

testRequest2 - 0.072 秒内 8580 次函数调用(8574 次原始调用)

testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds

testUrllib - 0.036 秒内 3810 次函数调用(3775 次原始调用)

testwget - 3489 function calls in 0.020 seconds

testwget - 0.020 秒内 3489 次函数调用

回答by Max

Exotic Windows Solution

异国情调的 Windows 解决方案

import subprocess

subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

回答by Jayme Snyder

I started down this path because ESXi's wget is not compiled with SSL and I wanted to download an OVA from a vendor's website directly onto the ESXi host which is on the other side of the world.

我开始走这条路是因为 ESXi 的 wget 不是用 SSL 编译的,我想从供应商的网站直接将 OVA 下载到世界另一端的 ESXi 主机上。

I had to disable the firewall(lazy)/enable https out by editing the rules(proper)

我不得不通过编辑规则(正确)禁用防火墙(懒惰)/启用 https

created the python script:

创建了python脚本:

import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()

dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
    with open("file.ova", 'wb') as tmp_file:
        shutil.copyfileobj(response, tmp_file)

ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https... so it inspired me to go down this path

ESXi 库有点成对,但开源 weasel 安装程序似乎将 urllib 用于 https ......所以它激励我走这条路

回答by Om Sao

For Python3+URLopeneris deprecated. And when used you will get error as below:

对于Python3+URLopener已弃用。使用时你会得到如下错误:

url_opener = urllib.URLopener() AttributeError: module 'urllib' has no attribute 'URLopener'

url_opener = urllib.URLopener() AttributeError: 模块 'urllib' 没有属性 'URLopener'

So, try:

所以,试试:

import urllib.request 
urllib.request.urlretrieve(url, filename)