Python 使用 requests.get() 时未提供架构和其他错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30770213/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
No schema supplied and other errors with using requests.get()
提问by Loi Huynh
I'm learning Python by following Automate the Boring Stuff. This program is supposed to go to http://xkcd.com/and download all the images for offline viewing.
我正在通过遵循自动化无聊的东西来学习 Python。该程序应该去http://xkcd.com/下载所有图像以供离线查看。
I'm on version 2.7 and Mac.
我使用的是 2.7 版和 Mac。
For some reason, I'm getting errors like "No schema supplied" and errors with using request.get() itself.
出于某种原因,我收到诸如“未提供架构”之类的错误以及使用 request.get() 本身的错误。
Here is my code:
这是我的代码:
# Saves the XKCD comic page for offline read
import requests, os, bs4, shutil
url = 'http://xkcd.com/'
if os.path.isdir('xkcd') == True: # If xkcd folder already exists
shutil.rmtree('xkcd') # delete it
else: # otherwise
os.makedirs('xkcd') # Creates xkcd foulder.
while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop
# Download the page
print 'Downloading %s page...' % url
res = requests.get(url) # Get the page
res.raise_for_status() # Check for errors
soup = bs4.BeautifulSoup(res.text) # Dowload the page
# Find the URL of the comic image
comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem
if comicElem == []: # if the list is empty
print 'Couldn\'t find the image!'
else:
comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to
# comicUrl
# Download the image
print 'Downloading the %s image...' % (comicUrl)
res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
res.raise_for_status() # Check for errors
# Save image to ./xkcd
imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(10000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev btn's URL
prevLink = soup.select('a[rel="prev"]')[0]
# The Previous button is first <a rel="prev" href="/1535/" accesskey="p">< Prev</a>
url = 'http://xkcd.com/' + prevLink.get('href')
# adds /1535/ to http://xkcd.com/
print 'Done!'
Here are the errors:
以下是错误:
Traceback (most recent call last):
File "/Users/XKCD.py", line 30, in <module>
res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request
prep = self.prepare_request(req)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare
self.prepare_url(url, params)
File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url
to_native_string(url, 'utf8')))
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png?
The thing is I've been reading the section in the book about the program multiple times, reading the Requests doc, as well as looking at other questions on here. My syntax looks right.
问题是我已经多次阅读书中关于该程序的部分,阅读请求文档,以及查看此处的其他问题。我的语法看起来正确。
Thanks for your help!
谢谢你的帮助!
Edit:
编辑:
This didn't work:
这不起作用:
comicUrl = ("http:"+comicElem[0].get('src'))
I thought adding the http: before would get rid of the no schema supplied error.
我认为添加 http: before 会消除没有提供架构的错误。
采纳答案by Ajay
change your comicUrl
to this
把你的改成comicUrl
这个
comicUrl = comicElem[0].get('src').strip("http://")
comicUrl="http://"+comicUrl
if 'xkcd' not in comicUrl:
comicUrl=comicUrl[:7]+'xkcd.com/'+comicUrl[7:]
print "comic url",comicUrl
回答by John
No schema means you haven't supplied the http://
or https://
supply these and it will do the trick.
没有模式意味着您没有提供http://
或https://
提供这些,它会起作用。
Edit: Look at this URL string!:
编辑:看看这个 URL 字符串!:
URL '//imgs.xkcd.com/comics/the_martian.png':
URL '//imgs.xkcd.com/comics/the_martian.png':
回答by easy_c0mpany80
Id just like to chime in here that I had this exact same error and used @Ajay recommended answer above but even after adding that I as still getting problems, right after the program downloaded the first image it would stop and return this error:
我只是想在这里补充一下,我遇到了完全相同的错误并使用了上面的@Ajay 推荐答案,但即使在添加我仍然遇到问题之后,在程序下载第一张图像后,它会立即停止并返回此错误:
ValueError: Unsupported or invalid CSS selector: "a[rel"
this was referring to one of the last lines in the program where it uses the 'Prev button' to go to the next image to download.
这是指程序中的最后一行,它使用“上一个按钮”转到下一个要下载的图像。
Anyway after going through the bs4 docs I made a slight change as follows and it seems to work just fine now:
无论如何,在浏览了 bs4 文档后,我做了如下的细微更改,现在似乎可以正常工作了:
prevLink = soup.select('a[rel^="prev"]')[0]
Someone else might run into the same problem so thought Id add this comment.
其他人可能会遇到同样的问题,所以我想添加这个评论。
回答by Admiral Gaust
Explanation:
解释:
A few XKCD pages have special content that isn't a simple image file. That's fine; you can just skip those. If your selector doesn't find any elements, then soup.select('#comic img') will return a blank list.
一些 XKCD 页面具有特殊内容,而不是简单的图像文件。没关系; 你可以跳过那些。如果您的选择器没有找到任何元素,那么soup.select('#comic img') 将返回一个空白列表。
Working Code:
工作代码:
import requests,os,bs4,shutil
url='http://xkcd.com'
#making new folder
if os.path.isdir('xkcd') == True:
shutil.rmtree('xkcd')
else:
os.makedirs('xkcd')
#scrapiing information
while not url.endswith('#'):
print('Downloading Page %s.....' %(url))
res = requests.get(url) #getting page
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)
comicElem = soup.select('#comic img') #getting img tag under comic divison
if comicElem == []: #if not found print error
print('could not find comic image')
else:
try:
comicUrl = 'http:' + comicElem[0].get('src') #getting comic url and then downloading its image
print('Downloading image %s.....' %(comicUrl))
res = requests.get(comicUrl)
res.raise_for_status()
except requests.exceptions.MissingSchema:
#skip if not a normal image file
prev = soup.select('a[rel="prev"]')[0]
url = 'http://xkcd.com' + prev.get('href')
continue
imageFile = open(os.path.join('xkcd',os.path.basename(comicUrl)),'wb') #write downloaded image to hard disk
for chunk in res.iter_content(10000):
imageFile.write(chunk)
imageFile.close()
#get previous link and update url
prev = soup.select('a[rel="prev"]')[0]
url = "http://xkcd.com" + prev.get('href')
print('Done...')