使用 Python 解析 HTTP 请求授权标头
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1349367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse an HTTP request Authorization header with Python
提问by Kris Walker
I need to take a header like this:
我需要这样的标题:
Authorization: Digest qop="chap",
realm="[email protected]",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
And parse it into this using Python:
并使用 Python 将其解析为:
{'protocol':'Digest',
'qop':'chap',
'realm':'[email protected]',
'username':'Foobear',
'response':'6629fae49393a05397450978507c4ef1',
'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}
Is there a library to do this, or something I could look at for inspiration?
有没有图书馆可以做到这一点,或者我可以寻找灵感?
I'm doing this on Google App Engine, and I'm not sure if the Pyparsing library is available, but maybe I could include it with my app if it is the best solution.
我正在 Google App Engine 上执行此操作,我不确定 Pyparsing 库是否可用,但如果它是最佳解决方案,也许我可以将其包含在我的应用程序中。
Currently I'm creating my own MyHeaderParser object and using it with reduce() on the header string. It's working, but very fragile.
目前我正在创建我自己的 MyHeaderParser 对象,并在标头字符串上将它与 reduce() 一起使用。它正在工作,但非常脆弱。
Brilliant solution by nadia below:
nadia 的出色解决方案如下:
import re
reg = re.compile('(\w+)[=] ?"?(\w+)"?')
s = """Digest
realm="stackoverflow.com", username="kixx"
"""
print str(dict(reg.findall(s)))
采纳答案by Nadia Alramli
A little regex:
一个小正则表达式:
import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')
>>>dict(reg.findall(headers))
{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}
回答by Piotr Czapla
You can also use urllib2 as CheryPydoes.
您也可以像CheryPy一样使用 urllib2 。
here is the snippet:
这是片段:
input= """
Authorization: Digest qop="chap",
realm="[email protected]",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
items = urllib2.parse_http_list(value)
opts = urllib2.parse_keqv_list(items)
opts['protocol'] = 'Digest'
print opts
it outputs:
它输出:
{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': '[email protected]', 'response': '6629fae49393a05397450978507c4ef1'}
回答by PaulMcG
Here's my pyparsing attempt:
这是我的 pyparsing 尝试:
text = """Authorization: Digest qop="chap",
realm="[email protected]",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41" """
from pyparsing import *
AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)
valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict
print authentry.parseString(text).dump()
which prints:
打印:
['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', '[email protected]'],
['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'],
['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: [email protected]
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear
I'm not familiar with the RFC, but I hope this gets you rolling.
我不熟悉 RFC,但我希望这能让您有所收获。
回答by ?s??o?
The http digest Authorization header field is a bit of an odd beast. Its format is similar to that of rfc 2616's Cache-Control and Content-Type header fields, but just different enough to be incompatible. If you're still looking for a library that's a little smarter and more readable than the regex, you might try removing the Authorization: Digest part with str.split()and parsing the rest with parse_dict_header()from Werkzeug's http module. (Werkzeug can be installed on App Engine.)
http 摘要 Authorization 标头字段有点奇怪。它的格式类似于rfc 2616的 Cache-Control 和 Content-Type 标头字段的格式,但只是不同到不兼容。如果你还在寻找这一点比正则表达式更聪明,更可读的图书馆,你可以尝试删除授权:用精华部分str.split()和解析,其余parse_dict_header()从WERKZEUG的HTTP模块。(Werkzeug 可以安装在 App Engine 上。)
回答by Brian McFarland
Nadia's regex only matches alphanumeric characters for the value of a parameter. That means it fails to parse at least two fields. Namely, the uri and qop. According to RFC 2617, the uri field is a duplicate of the string in the request line (i.e. the first line of the HTTP request). And qop fails to parse correctly if the value is "auth-int" due to the non-alphanumeric '-'.
Nadia 的正则表达式仅匹配参数值的字母数字字符。这意味着它无法解析至少两个字段。即uri和qop。根据 RFC 2617,uri 字段是请求行(即 HTTP 请求的第一行)中字符串的副本。如果值是“auth-int”,则由于非字母数字“-”,qop 无法正确解析。
This modified regex allows the URI (or any other value) to contain anything but ' ' (space), '"' (qoute), or ',' (comma). That's probably more permissive than it needs to be, but shouldn't cause any problems with correctlyformed HTTP requests.
这个修改后的正则表达式允许 URI(或任何其他值)包含除 ' '(空格)、'"'(qoute)或 ','(逗号)之外的任何内容。这可能比它需要的更宽容,但应该'不会导致正确形成的 HTTP 请求出现任何问题。
reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')
Bonus tip: From there, it's fairly straight forward to convert the example code in RFC-2617 to python. Using python's md5 API, "MD5Init()" becomes "m = md5.new()", "MD5Update()" becomes "m.update()" and "MD5Final()" becomes "m.digest()".
额外提示:从那里开始,将 RFC-2617 中的示例代码转换为 python 是相当直接的。使用python的md5 API,“MD5Init()”变成“m = md5.new()”,“MD5Update()”变成“m.update()”,“MD5Final()”变成“m.digest()”。
回答by biscuit314
An older question but one I found very helpful.
一个较旧的问题,但我发现它很有帮助。
I needed a parser to handle any properly formed Authorization header, as defined by RFC7235(raise your hand if you enjoy reading ABNF).
我需要一个解析器来处理任何格式正确的 Authorization 标头,如RFC7235所定义(如果您喜欢阅读 ABNF,请举手)。
Authorization = credentials
BWS = <BWS, see [RFC7230], Section 3.2.3>
OWS = <OWS, see [RFC7230], Section 3.2.3>
Proxy-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS
challenge ] )
Proxy-Authorization = credentials
WWW-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge
] )
auth-param = token BWS "=" BWS ( token / quoted-string )
auth-scheme = token
challenge = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *(
OWS "," [ OWS auth-param ] ) ] ) ]
credentials = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param )
*( OWS "," [ OWS auth-param ] ) ] ) ]
quoted-string = <quoted-string, see [RFC7230], Section 3.2.6>
token = <token, see [RFC7230], Section 3.2.6>
token68 = 1*( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" )
*"="
Starting with PaulMcG's answer, I came up with this:
从PaulMcG的回答开始,我想出了这个:
import pyparsing as pp
tchar = '!#$%&\'*+-.^_`|~' + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas
token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))
scheme = token('scheme')
header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))
credentials = scheme + (token68('token') ^ params('params'))
auth_parser = header + pp.Suppress(':') + credentials
This allows for parsing any Authorization header:
这允许解析任何 Authorization 标头:
parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==')
print('Authenticating with {0} scheme, token: {1}'.format(parsed['scheme'], parsed['token']))
which outputs:
输出:
Authenticating with Basic scheme, token: Zm9vOmJhcg==
Bringing it all together into an Authenticator
class:
将它们整合到一个Authenticator
类中:
import pyparsing as pp
from base64 import b64decode
import re
class Authenticator:
def __init__(self):
"""
Use pyparsing to create a parser for Authentication headers
"""
tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas
token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))
scheme = token('scheme')
auth_header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))
credentials = scheme + (token68('token') ^ params('params'))
# the moment of truth...
self.auth_parser = auth_header + pp.Suppress(':') + credentials
def authenticate(self, auth_header):
"""
Parse auth_header and call the correct authentication handler
"""
authenticated = False
try:
parsed = self.auth_parser.parseString(auth_header)
scheme = parsed['scheme']
details = parsed['token'] if 'token' in parsed.keys() else parsed['params']
print('Authenticating using {0} scheme'.format(scheme))
try:
safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower())
handler = getattr(self, 'auth_handle_' + safe_scheme)
authenticated = handler(details)
except AttributeError:
print('This is a valid Authorization header, but we do not handle this scheme yet.')
except pp.ParseException as ex:
print('Not a valid Authorization header')
print(ex)
return authenticated
# The following methods are fake, of course. They should use what's passed
# to them to actually authenticate, and return True/False if successful.
# For this demo I'll just print some of the values used to authenticate.
@staticmethod
def auth_handle_basic(token):
print('- token is {0}'.format(token))
try:
username, password = b64decode(token).decode().split(':', 1)
except Exception:
raise DecodeError
print('- username is {0}'.format(username))
print('- password is {0}'.format(password))
return True
@staticmethod
def auth_handle_bearer(token):
print('- token is {0}'.format(token))
return True
@staticmethod
def auth_handle_digest(params):
print('- username is {0}'.format(params['username']))
print('- cnonce is {0}'.format(params['cnonce']))
return True
@staticmethod
def auth_handle_aws4_hmac_sha256(params):
print('- Signature is {0}'.format(params['Signature']))
return True
To test this class:
要测试这个类:
tests = [
'Authorization: Digest qop="chap", realm="[email protected]", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"',
'Authorization: Bearer cn389ncoiwuencr',
'Authorization: Basic Zm9vOmJhcg==',
'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"',
'Authorization: CrazyCustom foo="bar", fizz="buzz"',
]
authenticator = Authenticator()
for test in tests:
authenticator.authenticate(test)
print()
Which outputs:
哪些输出:
Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41
Authenticating using Bearer scheme
- token is cn389ncoiwuencr
Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar
Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024
Authenticating using CrazyCustom scheme
This is a valid Authorization header, but we do not handle this scheme yet.
In future if we wish to handle CrazyCustom we'll just add
将来,如果我们希望处理 CrazyCustom,我们只需添加
def auth_handle_crazycustom(params):
回答by Ned Batchelder
If those components will always be there, then a regex will do the trick:
如果这些组件总是存在,那么正则表达式就可以解决问题:
test = '''Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''
import re
re_auth = re.compile(r"""
Authorization:\s*(?P<protocol>[^ ]+)\s+
qop="(?P<qop>[^"]+)",\s+
realm="(?P<realm>[^"]+)",\s+
username="(?P<username>[^"]+)",\s+
response="(?P<response>[^"]+)",\s+
cnonce="(?P<cnonce>[^"]+)"
""", re.VERBOSE)
m = re_auth.match(test)
print m.groupdict()
produces:
产生:
{ 'username': 'Foobear',
'protocol': 'Digest',
'qop': 'chap',
'cnonce': '5ccc069c403ebaf9f0171e9517f40e41',
'realm': '[email protected]',
'response': '6629fae49393a05397450978507c4ef1'
}
回答by Piotr Czapla
I would recommend finding a correct library for parsing http headers unfortunately I can't reacall any. :(
我建议找到一个正确的库来解析 http 标头,不幸的是我无法调用任何。:(
For a while check the snippet below (it should mostly work):
一段时间检查下面的代码段(它应该主要工作):
input= """
Authorization: Digest qop="chap",
realm="[email protected]",
username="Foob,ear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
field, sep, value = input.partition(":")
if field.endswith('Authorization'):
protocol, sep, opts_str = value.strip().partition(" ")
opts = {}
for opt in opts_str.split(",\n"):
key, value = opt.strip().split('=')
key = key.strip(" ")
value = value.strip(' "')
opts[key] = value
opts['protocol'] = protocol
print opts
回答by Jason R. Coombs
Your original concept of using PyParsing would be the best approach. What you've implicitly asked for is something that requires a grammar... that is, a regular expression or simple parsing routine is always going to be brittle, and that sounds like it's something you're trying to avoid.
您使用 PyParsing 的原始概念将是最好的方法。您隐含地要求的是需要语法的东西……也就是说,正则表达式或简单的解析例程总是很脆弱,这听起来像是您试图避免的东西。
It appears that getting pyparsing on google app engine is easy: How do I get PyParsing set up on the Google App Engine?
在 Google App Engine 上进行 pyparsing似乎很容易:如何在 Google App Engine 上设置 PyParsing?
So I'd go with that, and then implement the full HTTP authentication/authorization header support from rfc2617.
所以我会这样做,然后从 rfc2617 实现完整的 HTTP 身份验证/授权标头支持。
回答by Pinochle
If your response comes in a single string that that never varies and has as many lines as there are expressions to match, you can split it into an array on the newlines called authentication_array
and use regexps:
如果您的响应来自一个从不变化的单个字符串,并且行数与要匹配的表达式一样多,您可以将其拆分为一个名为换行符的数组,authentication_array
并使用正则表达式:
pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}
for line in authentication_array:
pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
match = re.search(re.compile(pattern), line) # make the match
if match:
parsed_dict[match.group(1)] = match.group(2)
i += 1