python请求带有标题和参数的POST

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51124516/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:44:16  来源:igfitidea点击:

python requests POST with header and parameters

pythonpython-3.xpostrequestpython-requests

提问by Gaurav Khe

I have a post request which I am trying to send using requestsin python. But I get an invalid 403 error. The requests works fine through the browser.

我有一个帖子请求,我试图requests在 python 中使用它发送。但是我收到无效的 403 错误。请求通过浏览器正常工作。

POST /ajax-load-system HTTP/1.1
Host: xyz.website.com
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-GB,en;q=0.5
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Referer: http://xyz.website.com/help-me/ZYc5Yn
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 56
Cookie: csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1
Connection: close

csrf_test_name=a3f8adecbf11e29c006d9817be96e8d4&vID=9999

What I am trying in python is:

我在 python 中尝试的是:

import requests
import json

url = 'http://xyz.website.com/ajax-load-system'

payload = {
'Host': 'xyz.website.com',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-GB,en;q=0.5',
'Referer': 'http://xyz.website.com/help-me/ZYc5Yn',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'Content-Length': '56',
'Cookie': 'csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1',
'Connection': 'close',
'csrf_test_name': 'a3f8adecbf11e29c006d9817be96e8d4',
'vID': '9999',
}    

headers = {}

r = requests.post(url, headers=headers, data=json.dumps(payload))
print(r.status_code)  

But this is printing a 403error code. What am I doing wrong here?

但这是打印403错误代码。我在这里做错了什么?

I am expecting a return response as json:

我期待作为 json 的返回响应:

{"status_message":"Thanks for help.","help_count":"141","status":true}

{"status_message":"Thanks for help.","help_count":"141","status":true}

回答by Martijn Pieters

You are confusing headers and payload, an the payload is not JSON encoded.

您混淆了标头和有效负载,有效负载不是 JSON 编码的

These are all headers:

这些都是标题:

Host: xyz.website.com
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-GB,en;q=0.5
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Referer: http://xyz.website.com/help-me/ZYc5Yn
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 56
Cookie: csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1
Connection: close

Most of these are automated and don't need to be set manually. requestswill set Hostfor you based on the URL, Acceptis set to an acceptable default, Accept-Languageis rarely needed in these situations, Referer, unless using HTTPS, is often not even set or filtered out for privacy reasons, so sites no longer rely on it being set, Content-Typemust actually reflect the contents of your POST(and is not JSON!), so requestssets this for you depending on how you call it, Content-Lengthmust reflect the actual content length, so is set by requestsas it is in the best position to calculate this, and Connectionshould definitely be handled by the library as you don't want to prevent it from efficiently re-using connections if it can.

其中大部分是自动化的,不需要手动设置。requestsHost根据 URL 为您设置,Accept设置为可接受的默认值,Accept-Language在这些情况下很少需要Referer,除非使用 HTTPS,出于隐私原因,通常甚至不会设置或过滤掉,因此站点不再依赖于它的设置,Content-Type必须实际反映您的内容POST(而不是 JSON!),因此requests根据您的调用方式为您设置它,Content-Length必须反映实际内容长度,因此设置为 ,requests因为它最适合计算此内容,并且Connection绝对应该由库处理,因为如果可以的话,您不想阻止它有效地重用连接。

At bestyou could set X-Requested-Withand User-Agent, but only if the server would not otherwise accept the request. The Cookiesheader reflect the values of cookies the browser holds. Your script can get their own set of cookies from the server by using a requests Session objectto make an initial GETrequest to the url named in the Refererheader (or other suitable URL on the same site), at which point the server should set cookies on the response, and those would be stored in the session for reuse on the post request. Use that mechanism to get your own CSRF cookie value.

充其量您可以设置X-Requested-Withand User-Agent,但前提是服务器不会以其他方式接受请求。该Cookies头反映的cookie浏览器保存的值。您的脚本可以通过使用requests Session 对象向标头中GET指定的 url Referer(或同一站点上的其他合适的 URL)发出初始请求,从而从服务器获取自己的一组 cookie ,此时服务器应设置 cookie响应,这些响应将存储在会话中,以便在发布请求中重用。使用该机制来获取您自己的 CSRF cookie 值。

Note the Content-Typeheader:

注意Content-Type标题:

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

When you pass in a dictionary to the datakeyword of the requests.post()function, the library will encode the data to exactly that content type for you.

当您将字典传递给函数的data关键字时requests.post(),库将为您将数据编码为该内容类型。

The actual payload is

实际有效载荷是

csrf_test_name=a3f8adecbf11e29c006d9817be96e8d4&vID=9999

These are two fields, csrf_test_name, and vID, that need to part of your payloaddictionary.

这是两个字段,csrf_test_name, 和vID,需要成为payload字典的一部分。

Note that the csrf_test_namevalue matches the csrf_cookie_namevalue in the cookies. This is how the site protects itself from Cross-site forgery attacks, where a third party may try to post to the same URL on your behalf. Such a third party would not have access to the same cookies so would be prevented. Your code needs to obtain a new cookie; a proper CSRF implementation would limit the time any CSRF cookie can be re-used.

请注意,该csrf_test_namecsrf_cookie_namecookie中的值相匹配。这就是站点保护自己免受跨站点伪造攻击的方式,在这种攻击中,第三方可能会代表您尝试向同一 URL 发帖。这样的第三方将无法访问相同的 cookie,因此将被阻止。您的代码需要获取一个新的 cookie;一个正确的 CSRF 实现会限制任何 CSRF cookie 可以被重用的时间。

So what would at leastbe needed to make it all work, is:

所以至少需要做的是:

# *optional*, the site may not care about these. If they *do* care, then
# they care about keeping out automated scripts and could in future 
# raise the stakes and require more 'browser-like' markers. Ask yourself
# if you want to anger the site owners and get into an arms race.
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
    'X-Requested-With': 'XMLHttpRequest',
}

payload = {
    'vID': 9999,
}

url = 'http://xyz.website.com/ajax-load-system'
# the URL from the Referer header, but others at the site would probably
# also work
initial_url = 'http://xyz.website.com/help-me/ZYc5Yn'

with requests.Session() as session:
    # obtain CSRF cookie
    initial_response  = session.get(initial_url)
    payload['csrf_test_name'] = session.cookies['csrf_cookie_name']

    # Now actually post with the correct CSRF cookie
    response = session.post(url, headers=headers, data=payload)

If this still causes issues, you'll need to try out two additional headers, , Acceptand Accept-Language. Take into account this will mean that the site has already thought long and hard about how to keep automated site scrapers out. Consider contacting them and asking if they offer an API option instead.

如果这仍然导致问题,您将需要尝试另外两个标头 、AcceptAccept-Language。考虑到这将意味着该站点已经仔细考虑了如何将自动站点抓取工具拒之门外。考虑联系他们并询问他们是否提供 API 选项。