Python 在 Scrapy 中发送帖子请求

Question

提问by Amit Tripathi

I am trying to crawl the latest reviews from google play store and to get that I need to make a post request.

我正在尝试从 google play 商店抓取最新评论，然后我需要提出一个帖子请求。

With the Postman, it works and I get desired response.

使用邮递员，它可以工作，我得到了想要的回应。

enter image description here

在此处输入图片说明

but a post request in terminal gives me a server error

但是终端中的 post 请求给了我一个服务器错误

For ex: this page https://play.google.com/store/apps/details?id=com.supercell.boombeach

例如：此页面https://play.google.com/store/apps/details?id=com.supercell.boombeach

curl -H "Content-Type: application/json" -X POST -d '{"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}' https://play.google.com/store/getreviews

gives a server error and

给出服务器错误和

Scrapy just ignores this line:

Scrapy 只是忽略了这一行：

frmdata = {"id": "com.supercell.boombeach", "reviewType": 0, "reviewSortOrder": 0, "pageNum":0}
        url = "https://play.google.com/store/getreviews"
        yield Request(url, callback=self.parse, method="POST", body=urllib.urlencode(frmdata))

Answer 1

采纳答案by Jithin

Make sure that each element in your formdatais of type string/unicode

确保您的每个元素formdata都是字符串/unicode 类型

frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
url = "https://play.google.com/store/getreviews"
yield FormRequest(url, callback=self.parse, formdata=frmdata)

I think this will do

我认为这会做

In [1]: from scrapy.http import FormRequest

In [2]: frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}

In [3]: url = "https://play.google.com/store/getreviews"

In [4]: r = FormRequest(url, formdata=frmdata)

In [5]: fetch(r)
 2015-05-20 14:40:09+0530 [default] DEBUG: Crawled (200) <POST      https://play.google.com/store/getreviews> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f3ea4258890>
[s]   item       {}
[s]   r          <POST https://play.google.com/store/getreviews>
[s]   request    <POST https://play.google.com/store/getreviews>
[s]   response   <200 https://play.google.com/store/getreviews>
[s]   settings   <scrapy.settings.Settings object at 0x7f3eaa205450>
[s]   spider     <Spider 'default' at 0x7f3ea3449cd0>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser

Answer 2

回答by Manoj Sahu

Sample Page Traversing using Post in Scrapy:

在 Scrapy 中使用 Post 的示例页面遍历：

def directory_page(self,response):
    if response:
        profiles = response.xpath("//div[@class='heading-h']/h3/a/@href").extract()
        for profile in profiles:
            yield Request(urljoin(response.url,profile),callback=self.profile_collector)

        page = response.meta['page'] + 1
        if page :
            yield FormRequest('https://rotmanconnect.com/AlumniDirectory/getmorerecentjoineduser',
                                        formdata={'isSortByName':'false','pageNumber':str(page)},
                                        callback= self.directory_page,
                                        meta={'page':page})
    else:
         print "No more page available"

Answer 3

回答by aitorhh

The answer above do not really solved the problem. They are sending the data as paramters instead of JSON data as the body of the request.

上面的答案并没有真正解决问题。他们将数据作为参数而不是 JSON 数据作为请求的正文发送。

From http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json:

来自http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json：

my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST', 
                          body=json.dumps(my_data), 
                          headers={'Content-Type':'application/json'} )

Python 在 Scrapy 中发送帖子请求

提问by Amit Tripathi

采纳答案by Jithin

回答by Manoj Sahu

回答by aitorhh

相关推荐

最近更新

标签

Python 在 Scrapy 中发送帖子请求

提问by Amit Tripathi

采纳答案by Jithin

回答by Manoj Sahu

回答by aitorhh

相关推荐

Python 在 docker 中部署最小的 Flask 应用程序 - 服务器连接问题

Python 读取一个巨大的 .csv 文件

Python 将列表的元素提升到幂

Python theano - TensorVariable 的打印值

相关推荐

最近更新

标签