Python 在 Scrapy 中发送帖子请求
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30342243/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Send Post Request in Scrapy
提问by Amit Tripathi
I am trying to crawl the latest reviews from google play store and to get that I need to make a post request.
我正在尝试从 google play 商店抓取最新评论,然后我需要提出一个帖子请求。
With the Postman, it works and I get desired response.
使用邮递员,它可以工作,我得到了想要的回应。
but a post request in terminal gives me a server error
但是终端中的 post 请求给了我一个服务器错误
For ex: this page https://play.google.com/store/apps/details?id=com.supercell.boombeach
例如:此页面https://play.google.com/store/apps/details?id=com.supercell.boombeach
curl -H "Content-Type: application/json" -X POST -d '{"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}' https://play.google.com/store/getreviews
gives a server error and
给出服务器错误和
Scrapy just ignores this line:
Scrapy 只是忽略了这一行:
frmdata = {"id": "com.supercell.boombeach", "reviewType": 0, "reviewSortOrder": 0, "pageNum":0}
url = "https://play.google.com/store/getreviews"
yield Request(url, callback=self.parse, method="POST", body=urllib.urlencode(frmdata))
采纳答案by Jithin
Make sure that each element in your formdata
is of type string/unicode
确保您的每个元素formdata
都是字符串/unicode 类型
frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
url = "https://play.google.com/store/getreviews"
yield FormRequest(url, callback=self.parse, formdata=frmdata)
I think this will do
我认为这会做
In [1]: from scrapy.http import FormRequest
In [2]: frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
In [3]: url = "https://play.google.com/store/getreviews"
In [4]: r = FormRequest(url, formdata=frmdata)
In [5]: fetch(r)
2015-05-20 14:40:09+0530 [default] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x7f3ea4258890>
[s] item {}
[s] r <POST https://play.google.com/store/getreviews>
[s] request <POST https://play.google.com/store/getreviews>
[s] response <200 https://play.google.com/store/getreviews>
[s] settings <scrapy.settings.Settings object at 0x7f3eaa205450>
[s] spider <Spider 'default' at 0x7f3ea3449cd0>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
回答by Manoj Sahu
Sample Page Traversing using Post in Scrapy:
在 Scrapy 中使用 Post 的示例页面遍历:
def directory_page(self,response):
if response:
profiles = response.xpath("//div[@class='heading-h']/h3/a/@href").extract()
for profile in profiles:
yield Request(urljoin(response.url,profile),callback=self.profile_collector)
page = response.meta['page'] + 1
if page :
yield FormRequest('https://rotmanconnect.com/AlumniDirectory/getmorerecentjoineduser',
formdata={'isSortByName':'false','pageNumber':str(page)},
callback= self.directory_page,
meta={'page':page})
else:
print "No more page available"
回答by aitorhh
The answer above do not really solved the problem. They are sending the data as paramters instead of JSON data as the body of the request.
上面的答案并没有真正解决问题。他们将数据作为参数而不是 JSON 数据作为请求的正文发送。
From http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json:
来自http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json:
my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST',
body=json.dumps(my_data),
headers={'Content-Type':'application/json'} )