使用 Python 解析 HTTP 请求授权标头

Question

提问by Kris Walker

I need to take a header like this:

我需要这样的标题：

 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"

And parse it into this using Python:

并使用 Python 将其解析为：

{'protocol':'Digest',
  'qop':'chap',
  'realm':'[email protected]',
  'username':'Foobear',
  'response':'6629fae49393a05397450978507c4ef1',
  'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}

Is there a library to do this, or something I could look at for inspiration?

有没有图书馆可以做到这一点，或者我可以寻找灵感？

I'm doing this on Google App Engine, and I'm not sure if the Pyparsing library is available, but maybe I could include it with my app if it is the best solution.

我正在 Google App Engine 上执行此操作，我不确定 Pyparsing 库是否可用，但如果它是最佳解决方案，也许我可以将其包含在我的应用程序中。

Currently I'm creating my own MyHeaderParser object and using it with reduce() on the header string. It's working, but very fragile.

目前我正在创建我自己的 MyHeaderParser 对象，并在标头字符串上将它与 reduce() 一起使用。它正在工作，但非常脆弱。

Brilliant solution by nadia below:

nadia 的出色解决方案如下：

import re

reg = re.compile('(\w+)[=] ?"?(\w+)"?')

s = """Digest
realm="stackoverflow.com", username="kixx"
"""

print str(dict(reg.findall(s)))

Answer 1

采纳答案by Nadia Alramli

A little regex:

一个小正则表达式：

import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')

>>>dict(reg.findall(headers))

{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}

Answer 2

回答by Piotr Czapla

You can also use urllib2 as CheryPydoes.

您也可以像CheryPy一样使用 urllib2 。

here is the snippet:

这是片段：

input= """
 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
    items = urllib2.parse_http_list(value)
    opts = urllib2.parse_keqv_list(items)
    opts['protocol'] = 'Digest'
    print opts

it outputs:

它输出：

{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': '[email protected]', 'response': '6629fae49393a05397450978507c4ef1'}

Answer 3

回答by PaulMcG

Here's my pyparsing attempt:

这是我的 pyparsing 尝试：

text = """Authorization: Digest qop="chap",
    realm="[email protected]",     
    username="Foobear",     
    response="6629fae49393a05397450978507c4ef1",     
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" """

from pyparsing import *

AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)

valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict

print authentry.parseString(text).dump()

which prints:

打印：

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', '[email protected]'],
 ['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
 ['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: [email protected]
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear

I'm not familiar with the RFC, but I hope this gets you rolling.

我不熟悉 RFC，但我希望这能让您有所收获。

Answer 4

回答by ?s??o?

The http digest Authorization header field is a bit of an odd beast. Its format is similar to that of rfc 2616's Cache-Control and Content-Type header fields, but just different enough to be incompatible. If you're still looking for a library that's a little smarter and more readable than the regex, you might try removing the Authorization: Digest part with str.split()and parsing the rest with parse_dict_header()from Werkzeug's http module. (Werkzeug can be installed on App Engine.)

http 摘要 Authorization 标头字段有点奇怪。它的格式类似于rfc 2616的 Cache-Control 和 Content-Type 标头字段的格式，但只是不同到不兼容。如果你还在寻找这一点比正则表达式更聪明，更可读的图书馆，你可以尝试删除授权：用精华部分str.split（）和解析，其余parse_dict_header（）从WERKZEUG的HTTP模块。（Werkzeug 可以安装在 App Engine 上。）

Answer 5

回答by Brian McFarland

Nadia's regex only matches alphanumeric characters for the value of a parameter. That means it fails to parse at least two fields. Namely, the uri and qop. According to RFC 2617, the uri field is a duplicate of the string in the request line (i.e. the first line of the HTTP request). And qop fails to parse correctly if the value is "auth-int" due to the non-alphanumeric '-'.

Nadia 的正则表达式仅匹配参数值的字母数字字符。这意味着它无法解析至少两个字段。即uri和qop。根据 RFC 2617，uri 字段是请求行（即 HTTP 请求的第一行）中字符串的副本。如果值是“auth-int”，则由于非字母数字“-”，qop 无法正确解析。

This modified regex allows the URI (or any other value) to contain anything but ' ' (space), '"' (qoute), or ',' (comma). That's probably more permissive than it needs to be, but shouldn't cause any problems with correctlyformed HTTP requests.

这个修改后的正则表达式允许 URI（或任何其他值）包含除 ' '（空格）、'"'（qoute）或 ','（逗号）之外的任何内容。这可能比它需要的更宽容，但应该'不会导致正确形成的 HTTP 请求出现任何问题。

reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')

Bonus tip: From there, it's fairly straight forward to convert the example code in RFC-2617 to python. Using python's md5 API, "MD5Init()" becomes "m = md5.new()", "MD5Update()" becomes "m.update()" and "MD5Final()" becomes "m.digest()".

额外提示：从那里开始，将 RFC-2617 中的示例代码转换为 python 是相当直接的。使用python的md5 API，“MD5Init()”变成“m = md5.new()”，“MD5Update()”变成“m.update()”，“MD5Final()”变成“m.digest()”。

Answer 6

回答by biscuit314

An older question but one I found very helpful.

一个较旧的问题，但我发现它很有帮助。

I needed a parser to handle any properly formed Authorization header, as defined by RFC7235(raise your hand if you enjoy reading ABNF).

我需要一个解析器来处理任何格式正确的 Authorization 标头，如RFC7235所定义（如果您喜欢阅读 ABNF，请举手）。

Authorization = credentials

BWS = <BWS, see [RFC7230], Section 3.2.3>

OWS = <OWS, see [RFC7230], Section 3.2.3>

Proxy-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS
 challenge ] )
Proxy-Authorization = credentials

WWW-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge
 ] )

auth-param = token BWS "=" BWS ( token / quoted-string )
auth-scheme = token

challenge = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *(
 OWS "," [ OWS auth-param ] ) ] ) ]
credentials = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param )
 *( OWS "," [ OWS auth-param ] ) ] ) ]

quoted-string = <quoted-string, see [RFC7230], Section 3.2.6>

token = <token, see [RFC7230], Section 3.2.6>
token68 = 1*( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" )
 *"="

Starting with PaulMcG's answer, I came up with this:

从PaulMcG的回答开始，我想出了这个：

import pyparsing as pp

tchar = '!#$%&\'*+-.^_`|~' + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas

token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))

scheme = token('scheme')

header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))

credentials = scheme + (token68('token') ^ params('params'))

auth_parser = header + pp.Suppress(':') + credentials

This allows for parsing any Authorization header:

这允许解析任何 Authorization 标头：

parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==')
print('Authenticating with {0} scheme, token: {1}'.format(parsed['scheme'], parsed['token']))

which outputs:

输出：

Authenticating with Basic scheme, token: Zm9vOmJhcg==

Bringing it all together into an Authenticatorclass:

将它们整合到一个Authenticator类中：

import pyparsing as pp
from base64 import b64decode
import re

class Authenticator:
    def __init__(self):
        """
        Use pyparsing to create a parser for Authentication headers
        """
        tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas
        t68char = '-._~+/' + pp.nums + pp.alphas

        token = pp.Word(tchar)
        token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))

        scheme = token('scheme')

        auth_header = pp.Keyword('Authorization')
        name = pp.Word(pp.alphas, pp.alphanums)
        value = pp.quotedString.setParseAction(pp.removeQuotes)
        name_value_pair = name + pp.Suppress('=') + value
        params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))

        credentials = scheme + (token68('token') ^ params('params'))

        # the moment of truth...
        self.auth_parser = auth_header + pp.Suppress(':') + credentials


    def authenticate(self, auth_header):
        """
        Parse auth_header and call the correct authentication handler
        """
        authenticated = False
        try:
            parsed = self.auth_parser.parseString(auth_header)
            scheme = parsed['scheme']
            details = parsed['token'] if 'token' in parsed.keys() else parsed['params']

            print('Authenticating using {0} scheme'.format(scheme))
            try:
                safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower())
                handler = getattr(self, 'auth_handle_' + safe_scheme)
                authenticated = handler(details)
            except AttributeError:
                print('This is a valid Authorization header, but we do not handle this scheme yet.')

        except pp.ParseException as ex:
            print('Not a valid Authorization header')
            print(ex)

        return authenticated


    # The following methods are fake, of course.  They should use what's passed
    # to them to actually authenticate, and return True/False if successful.
    # For this demo I'll just print some of the values used to authenticate.
    @staticmethod
    def auth_handle_basic(token):
        print('- token is {0}'.format(token))
        try:
            username, password = b64decode(token).decode().split(':', 1)
        except Exception:
            raise DecodeError
        print('- username is {0}'.format(username))
        print('- password is {0}'.format(password))
        return True

    @staticmethod
    def auth_handle_bearer(token):
        print('- token is {0}'.format(token))
        return True

    @staticmethod
    def auth_handle_digest(params):
        print('- username is {0}'.format(params['username']))
        print('- cnonce is {0}'.format(params['cnonce']))
        return True

    @staticmethod
    def auth_handle_aws4_hmac_sha256(params):
        print('- Signature is {0}'.format(params['Signature']))
        return True

To test this class:

要测试这个类：

tests = [
    'Authorization: Digest qop="chap", realm="[email protected]", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"',
    'Authorization: Bearer cn389ncoiwuencr',
    'Authorization: Basic Zm9vOmJhcg==',
    'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"',
    'Authorization: CrazyCustom foo="bar", fizz="buzz"',
]

authenticator = Authenticator()

for test in tests:
    authenticator.authenticate(test)
    print()

Which outputs:

哪些输出：

Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41

Authenticating using Bearer scheme
- token is cn389ncoiwuencr

Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar

Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024

Authenticating using CrazyCustom scheme 
This is a valid Authorization header, but we do not handle this scheme yet.

In future if we wish to handle CrazyCustom we'll just add

将来，如果我们希望处理 CrazyCustom，我们只需添加

def auth_handle_crazycustom(params):

Answer 7

回答by Ned Batchelder

If those components will always be there, then a regex will do the trick:

如果这些组件总是存在，那么正则表达式就可以解决问题：

test = '''Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''

import re

re_auth = re.compile(r"""
    Authorization:\s*(?P<protocol>[^ ]+)\s+
    qop="(?P<qop>[^"]+)",\s+
    realm="(?P<realm>[^"]+)",\s+
    username="(?P<username>[^"]+)",\s+
    response="(?P<response>[^"]+)",\s+
    cnonce="(?P<cnonce>[^"]+)"
    """, re.VERBOSE)

m = re_auth.match(test)
print m.groupdict()

produces:

产生：

{ 'username': 'Foobear', 
  'protocol': 'Digest', 
  'qop': 'chap', 
  'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 
  'realm': '[email protected]', 
  'response': '6629fae49393a05397450978507c4ef1'
}

Answer 8

回答by Piotr Czapla

I would recommend finding a correct library for parsing http headers unfortunately I can't reacall any. :(

我建议找到一个正确的库来解析 http 标头，不幸的是我无法调用任何。:(

For a while check the snippet below (it should mostly work):

一段时间检查下面的代码段（它应该主要工作）：

input= """
 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foob,ear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""

field, sep, value = input.partition(":")
if field.endswith('Authorization'):
   protocol, sep, opts_str = value.strip().partition(" ")

   opts = {}
   for opt in opts_str.split(",\n"):
        key, value = opt.strip().split('=')
        key = key.strip(" ")
        value = value.strip(' "')
        opts[key] = value

   opts['protocol'] = protocol

   print opts

Answer 9

回答by Jason R. Coombs

Your original concept of using PyParsing would be the best approach. What you've implicitly asked for is something that requires a grammar... that is, a regular expression or simple parsing routine is always going to be brittle, and that sounds like it's something you're trying to avoid.

您使用 PyParsing 的原始概念将是最好的方法。您隐含地要求的是需要语法的东西……也就是说，正则表达式或简单的解析例程总是很脆弱，这听起来像是您试图避免的东西。

It appears that getting pyparsing on google app engine is easy: How do I get PyParsing set up on the Google App Engine?

在 Google App Engine 上进行 pyparsing似乎很容易：如何在 Google App Engine 上设置 PyParsing？

So I'd go with that, and then implement the full HTTP authentication/authorization header support from rfc2617.

所以我会这样做，然后从 rfc2617 实现完整的 HTTP 身份验证/授权标头支持。

Answer 10

回答by Pinochle

If your response comes in a single string that that never varies and has as many lines as there are expressions to match, you can split it into an array on the newlines called authentication_arrayand use regexps:

如果您的响应来自一个从不变化的单个字符串，并且行数与要匹配的表达式一样多，您可以将其拆分为一个名为换行符的数组，authentication_array并使用正则表达式：

pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}

for line in authentication_array:
    pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
    match = re.search(re.compile(pattern), line)         # make the match
    if match:
        parsed_dict[match.group(1)] = match.group(2)
    i += 1

使用 Python 解析 HTTP 请求授权标头

提问by Kris Walker

采纳答案by Nadia Alramli

回答by Piotr Czapla

回答by PaulMcG

回答by ?s??o?

回答by Brian McFarland

回答by biscuit314

回答by Ned Batchelder

回答by Piotr Czapla

回答by Jason R. Coombs

回答by Pinochle

相关推荐

最近更新

标签

使用 Python 解析 HTTP 请求授权标头

提问by Kris Walker

采纳答案by Nadia Alramli

回答by Piotr Czapla

回答by PaulMcG

回答by ?s??o?

回答by Brian McFarland

回答by biscuit314

回答by Ned Batchelder

回答by Piotr Czapla

回答by Jason R. Coombs

回答by Pinochle

相关推荐

如何指定某些命令行参数在 Python 中是必需的？

python 迭代 3D 数组的 Pythonic 方式

python Django URL.py 和索引

强制执行 python 代码风格/标准的工具

相关推荐

最近更新

标签