Python 使用 requests.get() 时未提供架构和其他错误

Question

提问by Loi Huynh

I'm learning Python by following Automate the Boring Stuff. This program is supposed to go to http://xkcd.com/and download all the images for offline viewing.

我正在通过遵循自动化无聊的东西来学习 Python。该程序应该去http://xkcd.com/下载所有图像以供离线查看。

I'm on version 2.7 and Mac.

我使用的是 2.7 版和 Mac。

For some reason, I'm getting errors like "No schema supplied" and errors with using request.get() itself.

出于某种原因，我收到诸如“未提供架构”之类的错误以及使用 request.get() 本身的错误。

Here is my code:

这是我的代码：

# Saves the XKCD comic page for offline read

import requests, os, bs4, shutil

url = 'http://xkcd.com/'

if os.path.isdir('xkcd') == True: # If xkcd folder already exists
    shutil.rmtree('xkcd') # delete it
else: # otherwise
    os.makedirs('xkcd') # Creates xkcd foulder.


while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop
    # Download the page
    print 'Downloading %s page...' % url
    res = requests.get(url) # Get the page
    res.raise_for_status() # Check for errors

    soup = bs4.BeautifulSoup(res.text) # Dowload the page
    # Find the URL of the comic image
    comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem
    if comicElem == []: # if the list is empty
        print 'Couldn\'t find the image!'
    else:
        comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to
        # comicUrl

        # Download the image
        print 'Downloading the %s image...' % (comicUrl)
        res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
        res.raise_for_status() # Check for errors

        # Save image to ./xkcd
        imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
        for chunk in res.iter_content(10000):
            imageFile.write(chunk)
        imageFile.close()
    # Get the Prev btn's URL
    prevLink = soup.select('a[rel="prev"]')[0]
    # The Previous button is first <a rel="prev" href="/1535/" accesskey="p">&lt; Prev</a>
    url = 'http://xkcd.com/' + prevLink.get('href')
    # adds /1535/ to http://xkcd.com/

print 'Done!'

Here are the errors:

以下是错误：

Traceback (most recent call last):
  File "/Users/XKCD.py", line 30, in <module>
    res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
  File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request
    prep = self.prepare_request(req)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare
    self.prepare_url(url, params)
  File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url
    to_native_string(url, 'utf8')))
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png?

The thing is I've been reading the section in the book about the program multiple times, reading the Requests doc, as well as looking at other questions on here. My syntax looks right.

问题是我已经多次阅读书中关于该程序的部分，阅读请求文档，以及查看此处的其他问题。我的语法看起来正确。

Thanks for your help!

谢谢你的帮助！

Edit:

编辑：

This didn't work:

这不起作用：

comicUrl = ("http:"+comicElem[0].get('src'))

I thought adding the http: before would get rid of the no schema supplied error.

我认为添加 http: before 会消除没有提供架构的错误。

Answer 1

采纳答案by Ajay

change your comicUrlto this

把你的改成comicUrl这个

comicUrl = comicElem[0].get('src').strip("http://")
comicUrl="http://"+comicUrl
if 'xkcd' not in comicUrl:
    comicUrl=comicUrl[:7]+'xkcd.com/'+comicUrl[7:]

print "comic url",comicUrl

Answer 2

回答by John

No schema means you haven't supplied the http://or https://supply these and it will do the trick.

没有模式意味着您没有提供http://或https://提供这些，它会起作用。

Edit: Look at this URL string!:

编辑：看看这个 URL 字符串！：

URL '//imgs.xkcd.com/comics/the_martian.png':

Answer 3

回答by easy_c0mpany80

Id just like to chime in here that I had this exact same error and used @Ajay recommended answer above but even after adding that I as still getting problems, right after the program downloaded the first image it would stop and return this error:

我只是想在这里补充一下，我遇到了完全相同的错误并使用了上面的@Ajay 推荐答案，但即使在添加我仍然遇到问题之后，在程序下载第一张图像后，它会立即停止并返回此错误：

ValueError: Unsupported or invalid CSS selector: "a[rel"

this was referring to one of the last lines in the program where it uses the 'Prev button' to go to the next image to download.

这是指程序中的最后一行，它使用“上一个按钮”转到下一个要下载的图像。

Anyway after going through the bs4 docs I made a slight change as follows and it seems to work just fine now:

无论如何，在浏览了 bs4 文档后，我做了如下的细微更改，现在似乎可以正常工作了：

prevLink = soup.select('a[rel^="prev"]')[0]

Someone else might run into the same problem so thought Id add this comment.

其他人可能会遇到同样的问题，所以我想添加这个评论。

Answer 4

回答by Admiral Gaust

Explanation:

解释：

A few XKCD pages have special content that isn't a simple image file. That's fine; you can just skip those. If your selector doesn't find any elements, then soup.select('#comic img') will return a blank list.

一些 XKCD 页面具有特殊内容，而不是简单的图像文件。没关系; 你可以跳过那些。如果您的选择器没有找到任何元素，那么soup.select('#comic img') 将返回一个空白列表。

Working Code:

工作代码：

import requests,os,bs4,shutil

url='http://xkcd.com'

#making new folder
if os.path.isdir('xkcd') == True:
    shutil.rmtree('xkcd')
else:
    os.makedirs('xkcd')


#scrapiing information
while not url.endswith('#'):
    print('Downloading Page %s.....' %(url))
    res = requests.get(url)          #getting page
    res.raise_for_status()
    soup = bs4.BeautifulSoup(res.text)

    comicElem = soup.select('#comic img')     #getting img tag under  comic divison
    if comicElem == []:                        #if not found print error
        print('could not find comic image')

    else:
        try:
            comicUrl = 'http:' + comicElem[0].get('src')             #getting comic url and then downloading its image
            print('Downloading image %s.....' %(comicUrl))
            res = requests.get(comicUrl)
            res.raise_for_status()

        except requests.exceptions.MissingSchema:
        #skip if not a normal image file
            prev = soup.select('a[rel="prev"]')[0]
            url = 'http://xkcd.com' + prev.get('href')
            continue

        imageFile = open(os.path.join('xkcd',os.path.basename(comicUrl)),'wb')     #write  downloaded image to hard disk
        for chunk in res.iter_content(10000):
            imageFile.write(chunk)
        imageFile.close()

        #get previous link and update url
        prev = soup.select('a[rel="prev"]')[0]
        url = "http://xkcd.com" + prev.get('href')


print('Done...')

Python 使用 requests.get() 时未提供架构和其他错误

提问by Loi Huynh

采纳答案by Ajay

回答by John

回答by easy_c0mpany80

回答by Admiral Gaust

相关推荐

最近更新

标签

Python 使用 requests.get() 时未提供架构和其他错误

提问by Loi Huynh

采纳答案by Ajay

回答by John

回答by easy_c0mpany80

回答by Admiral Gaust

相关推荐

Python 无法从烧瓶中的 send_from_directory() 检索文件

Python 3 UnicodeDecodeError：“charmap”编解码器无法解码字节 0x9d

Python 中的散点图和颜色映射

Python 如何将计算字段添加到 Django 模型

相关推荐

最近更新

标签