在python中下载基本的http文件并保存到磁盘?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19602931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Basic http file downloading and saving to disk in python?
提问by arvindch
I'm new to Python and I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.
我是 Python 新手,我一直在浏览本网站上的问答,以回答我的问题。但是,我是初学者,我发现很难理解某些解决方案。我需要一个非常基本的解决方案。
Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?
有人可以向我解释“通过 http 下载文件”和“在 Windows 中将其保存到磁盘”的简单解决方案吗?
I'm not sure how to use shutil and os modules, either.
我也不知道如何使用 shutil 和 os 模块。
The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!
我要下载的文件小于 500 MB,是一个 .gz 存档文件。如果有人能解释如何提取存档并利用其中的文件,那就太好了!
Here's a partial solution, that I wrote from various answers combined:
这是一个部分解决方案,我从各种答案中组合而成:
import requests
import os
import shutil
global dump
def download_file():
global dump
url = "http://randomsite.com/file.gz"
file = requests.get(url, stream=True)
dump = file.raw
def save_file():
global dump
location = os.path.abspath("D:\folder\file.gz")
with open("file.gz", 'wb') as location:
shutil.copyfileobj(dump, location)
del dump
Could someone point out errors (beginner level) and explain any easier methods to do this?
有人可以指出错误(初学者级别)并解释任何更简单的方法来做到这一点?
Thanks!
谢谢!
采纳答案by Blue Ice
A clean way to download a file is:
一种干净的下载文件的方法是:
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")
This downloads a file from a website and names it file.gz
. This is one of my favorite solutions, from Downloading a picture via urllib and python.
这将从网站下载文件并将其命名为file.gz
. 这是我最喜欢的解决方案之一,来自通过 urllib 和 python 下载图片。
This example uses the urllib
library, and it will directly retrieve the file form a source.
本示例使用该urllib
库,它将直接从源中检索文件。
回答by dparpyani
回答by Ali
回答by Ala
Another clean way to save the file is this:
另一种保存文件的干净方法是:
import csv
import urllib
urllib.retrieve("your url goes here" , "output.csv")
回答by Saurabh yadav
Four methods using wget, urllib and request.
四种方法使用 wget、urllib 和 request。
#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget
url = 'https://tinypng.com/images/social/website.jpg'
def testRequest():
image_name = 'test1.jpg'
r = requests.get(url, stream=True)
with open(image_name, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)
def testRequest2():
image_name = 'test2.jpg'
r = requests.get(url)
i = Image.open(StringIO(r.content))
i.save(image_name)
def testUrllib():
image_name = 'test3.jpg'
testfile = urllib.URLopener()
testfile.retrieve(url, image_name)
def testwget():
image_name = 'test4.jpg'
wget.download(url, image_name)
if __name__ == '__main__':
profile.run('testRequest()')
profile.run('testRequest2()')
profile.run('testUrllib()')
profile.run('testwget()')
testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds
testRequest - 20.236 秒内 4469882 次函数调用(4469842 次原始调用)
testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds
testRequest2 - 0.072 秒内 8580 次函数调用(8574 次原始调用)
testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds
testUrllib - 0.036 秒内 3810 次函数调用(3775 次原始调用)
testwget - 3489 function calls in 0.020 seconds
testwget - 0.020 秒内 3489 次函数调用
回答by Max
Exotic Windows Solution
异国情调的 Windows 解决方案
import subprocess
subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)
回答by Jayme Snyder
I started down this path because ESXi's wget is not compiled with SSL and I wanted to download an OVA from a vendor's website directly onto the ESXi host which is on the other side of the world.
我开始走这条路是因为 ESXi 的 wget 不是用 SSL 编译的,我想从供应商的网站直接将 OVA 下载到世界另一端的 ESXi 主机上。
I had to disable the firewall(lazy)/enable https out by editing the rules(proper)
我不得不通过编辑规则(正确)禁用防火墙(懒惰)/启用 https
created the python script:
创建了python脚本:
import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()
dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
with open("file.ova", 'wb') as tmp_file:
shutil.copyfileobj(response, tmp_file)
ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https... so it inspired me to go down this path
ESXi 库有点成对,但开源 weasel 安装程序似乎将 urllib 用于 https ......所以它激励我走这条路
回答by Om Sao
For Python3+URLopener
is deprecated.
And when used you will get error as below:
对于Python3+URLopener
已弃用。使用时你会得到如下错误:
url_opener = urllib.URLopener() AttributeError: module 'urllib' has no attribute 'URLopener'
url_opener = urllib.URLopener() AttributeError: 模块 'urllib' 没有属性 'URLopener'
So, try:
所以,试试:
import urllib.request
urllib.request.urlretrieve(url, filename)