Python 在 Scrapy 中发送帖子请求

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30342243/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:17:54  来源:igfitidea点击:

Send Post Request in Scrapy

pythonpython-2.7scrapyweb-crawler

提问by Amit Tripathi

I am trying to crawl the latest reviews from google play store and to get that I need to make a post request.

我正在尝试从 google play 商店抓取最新评论,然后我需要提出一个帖子请求。

With the Postman, it works and I get desired response.

使用邮递员,它可以工作,我得到了想要的回应。

enter image description here

在此处输入图片说明

but a post request in terminal gives me a server error

但是终端中的 post 请求给了我一个服务器错误

For ex: this page https://play.google.com/store/apps/details?id=com.supercell.boombeach

例如:此页面https://play.google.com/store/apps/details?id=com.supercell.boombeach

curl -H "Content-Type: application/json" -X POST -d '{"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}' https://play.google.com/store/getreviews

gives a server error and

给出服务器错误和

Scrapy just ignores this line:

Scrapy 只是忽略了这一行:

frmdata = {"id": "com.supercell.boombeach", "reviewType": 0, "reviewSortOrder": 0, "pageNum":0}
        url = "https://play.google.com/store/getreviews"
        yield Request(url, callback=self.parse, method="POST", body=urllib.urlencode(frmdata))

采纳答案by Jithin

Make sure that each element in your formdatais of type string/unicode

确保您的每个元素formdata都是字符串/unicode 类型

frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
url = "https://play.google.com/store/getreviews"
yield FormRequest(url, callback=self.parse, formdata=frmdata)

I think this will do

我认为这会做

In [1]: from scrapy.http import FormRequest

In [2]: frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}

In [3]: url = "https://play.google.com/store/getreviews"

In [4]: r = FormRequest(url, formdata=frmdata)

In [5]: fetch(r)
 2015-05-20 14:40:09+0530 [default] DEBUG: Crawled (200) <POST      https://play.google.com/store/getreviews> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f3ea4258890>
[s]   item       {}
[s]   r          <POST https://play.google.com/store/getreviews>
[s]   request    <POST https://play.google.com/store/getreviews>
[s]   response   <200 https://play.google.com/store/getreviews>
[s]   settings   <scrapy.settings.Settings object at 0x7f3eaa205450>
[s]   spider     <Spider 'default' at 0x7f3ea3449cd0>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser

回答by Manoj Sahu

Sample Page Traversing using Post in Scrapy:

在 Scrapy 中使用 Post 的示例页面遍历:

def directory_page(self,response):
    if response:
        profiles = response.xpath("//div[@class='heading-h']/h3/a/@href").extract()
        for profile in profiles:
            yield Request(urljoin(response.url,profile),callback=self.profile_collector)

        page = response.meta['page'] + 1
        if page :
            yield FormRequest('https://rotmanconnect.com/AlumniDirectory/getmorerecentjoineduser',
                                        formdata={'isSortByName':'false','pageNumber':str(page)},
                                        callback= self.directory_page,
                                        meta={'page':page})
    else:
         print "No more page available"

回答by aitorhh

The answer above do not really solved the problem. They are sending the data as paramters instead of JSON data as the body of the request.

上面的答案并没有真正解决问题。他们将数据作为参数而不是 JSON 数据作为请求的正文发送。

From http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json:

来自http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json

my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST', 
                          body=json.dumps(my_data), 
                          headers={'Content-Type':'application/json'} )