记录来自 python-requests 模块的所有请求

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16337511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:20:29  来源:igfitidea点击:

Log all requests from the python-requests module

pythonloggingpython-requests

提问by blueFast

I am using python Requests. I need to debug some OAuthactivity, and for that I would like it to log all requests being performed. I could get this information with ngrep, but unfortunately it is not possible to grep https connections (which are needed for OAuth)

我正在使用 python请求。我需要调试一些OAuth活动,为此我希望它记录正在执行的所有请求。我可以使用 获取此信息ngrep,但不幸的是无法 grep https 连接(这是 需要的OAuth

How can I activate logging of all URLs (+ parameters) that Requestsis accessing?

如何激活Requests正在访问的所有 URL(+ 参数)的日志记录?

采纳答案by Martijn Pieters

The underlying urllib3library logs all new connections and URLs with the loggingmodule, but not POSTbodies. For GETrequests this should be enough:

底层urllib3库记录与logging模块的所有新连接和 URL ,但不记录POST主体。对于GET请求,这应该足够了:

import logging

logging.basicConfig(level=logging.DEBUG)

which gives you the most verbose logging option; see the logging HOWTOfor more details on how to configure logging levels and destinations.

它为您提供了最详细的日志记录选项;有关如何配置日志记录级别和目标的更多详细信息,请参阅日志记录 HOWTO

Short demo:

简短演示:

>>> import requests
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366

Depending on the exact version of urllib3, the following messages are logged:

根据 urllib3 的确切版本,将记录以下消息:

  • INFO: Redirects
  • WARN: Connection pool full (if this happens often increase the connection pool size)
  • WARN: Failed to parse headers (response headers with invalid format)
  • WARN: Retrying the connection
  • WARN: Certificate did not match expected hostname
  • WARN: Received response with both Content-Length and Transfer-Encoding, when processing a chunked response
  • DEBUG: New connections (HTTP or HTTPS)
  • DEBUG: Dropped connections
  • DEBUG: Connection details: method, path, HTTP version, status code and response length
  • DEBUG: Retry count increments
  • INFO: 重定向
  • WARN:连接池已满(如果发生这种情况经常增加连接池大小)
  • WARN: 无法解析标头(格式无效的响应标头)
  • WARN: 重试连接
  • WARN: 证书与预期的主机名不匹配
  • WARN: 处理分块响应时,接收到包含 Content-Length 和 Transfer-Encoding 的响应
  • DEBUG:新连接(HTTP 或 HTTPS)
  • DEBUG: 连接断开
  • DEBUG: 连接详细信息:方法、路径、HTTP 版本、状态码和响应长度
  • DEBUG: 重试计数增量

This doesn't include headers or bodies. urllib3uses the http.client.HTTPConnectionclass to do the grunt-work, but that class doesn't support logging, it can normally only be configured to printto stdout. However, you can rig it to send all debug information to logging instead by introducing an alternative printname into that module:

这不包括标题或正文。urllib3使用http.client.HTTPConnection该类来完成繁重的工作,但该类不支持日志记录,通常只能将其配置为打印到标准输出。但是,您可以通过print在该模块中引入替代名称来操纵它以将所有调试信息发送到日志记录:

import logging
import http.client

httpclient_logger = logging.getLogger("http.client")

def httpclient_logging_patch(level=logging.DEBUG):
    """Enable HTTPConnection debug logging to the logging framework"""

    def httpclient_log(*args):
        httpclient_logger.log(level, " ".join(args))

    # mask the print() built-in in the http.client module to use
    # logging instead
    http.client.print = httpclient_log
    # enable debugging
    http.client.HTTPConnection.debuglevel = 1

Calling httpclient_logging_patch()causes http.clientconnections to output all debug information to a standard logger, and so are picked up by logging.basicConfig():

调用httpclient_logging_patch()会导致http.client连接将所有调试信息输出到标准记录器,因此由以下各项获取logging.basicConfig()

>>> httpclient_logging_patch()
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:http.client:send: b'GET /get?foo=bar&baz=python HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
DEBUG:http.client:reply: 'HTTP/1.1 200 OK\r\n'
DEBUG:http.client:header: Date: Tue, 04 Feb 2020 13:36:53 GMT
DEBUG:http.client:header: Content-Type: application/json
DEBUG:http.client:header: Content-Length: 366
DEBUG:http.client:header: Connection: keep-alive
DEBUG:http.client:header: Server: gunicorn/19.9.0
DEBUG:http.client:header: Access-Control-Allow-Origin: *
DEBUG:http.client:header: Access-Control-Allow-Credentials: true
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366

回答by Yohann

You need to enable debugging at httpliblevel (requestsurllib3httplib).

您需要在httplib级别 ( requestsurllib3httplib)上启用调试。

Here's some functions to both toggle (..._on()and ..._off()) or temporarily have it on:

这里有一些功能可以同时切换 (..._on()..._off()) 或暂时打开它:

import logging
import contextlib
try:
    from http.client import HTTPConnection # py3
except ImportError:
    from httplib import HTTPConnection # py2

def debug_requests_on():
    '''Switches on logging of the requests module.'''
    HTTPConnection.debuglevel = 1

    logging.basicConfig()
    logging.getLogger().setLevel(logging.DEBUG)
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.DEBUG)
    requests_log.propagate = True

def debug_requests_off():
    '''Switches off logging of the requests module, might be some side-effects'''
    HTTPConnection.debuglevel = 0

    root_logger = logging.getLogger()
    root_logger.setLevel(logging.WARNING)
    root_logger.handlers = []
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.WARNING)
    requests_log.propagate = False

@contextlib.contextmanager
def debug_requests():
    '''Use with 'with'!'''
    debug_requests_on()
    yield
    debug_requests_off()

Demo use:

演示使用:

>>> requests.get('http://httpbin.org/')
<Response [200]>

>>> debug_requests_on()
>>> requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 12150
send: 'GET / HTTP/1.1\r\nHost: httpbin.org\r\nConnection: keep-alive\r\nAccept-
Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.11.1\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
...
<Response [200]>

>>> debug_requests_off()
>>> requests.get('http://httpbin.org/')
<Response [200]>

>>> with debug_requests():
...     requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
...
<Response [200]>

You will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA. The only thing missing will be the response.body which is not logged.

您将看到 REQUEST,包括 HEADERS 和 DATA,以及 RESPONSE with HEADERS but without DATA。唯一缺少的是没有记录的 response.body 。

Source

来源

回答by forrestj

For those using python 3+

对于那些使用 python 3+

import requests
import logging
import http.client

http.client.HTTPConnection.debuglevel = 1

logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

回答by Mike Smith

I'm using python 3.4, requests 2.19.1:

我正在使用 python 3.4,请求 2.19.1:

'urllib3' is the logger to get now (no longer 'requests.packages.urllib3'). Basic logging will still happen without setting http.client.HTTPConnection.debuglevel

'urllib3' 是现在获取的记录器(不再是 'requests.packages.urllib3')。如果没有设置 http.client.HTTPConnection.debuglevel,基本日志记录仍然会发生

回答by abulka

When trying to get the Python logging system (import logging) to emit low level debug log messages, it suprised me to discover that given:

当试图让 Python 日志系统 ( import logging) 发出低级调试日志消息时,我惊讶地发现:

requests --> urllib3 --> http.client.HTTPConnection

that only urllib3actually uses the Python loggingsystem:

urllib3实际使用 Pythonlogging系统:

  • requestsno
  • http.client.HTTPConnectionno
  • urllib3yes
  • requests
  • http.client.HTTPConnection
  • urllib3是的

Sure, you can extract debug messages from HTTPConnectionby setting:

当然,您可以HTTPConnection通过设置从中提取调试消息:

HTTPConnection.debuglevel = 1

but these outputs are merely emitted via the printstatement. To prove this, simply grep the Python 3.7 client.pysource code and view the print statements yourself (thanks @Yohann):

但这些输出仅通过print语句发出。为了证明这一点,只需 grep Python 3.7client.py源代码并自己查看打印语句(感谢@Yohann):

curl https://raw.githubusercontent.com/python/cpython/3.7/Lib/http/client.py |grep -A1 debuglevel` 

Presumably redirecting stdout in some way might work to shoe-horn stdout into the logging system and potentially capture to e.g. a log file.

据推测,以某种方式重定向标准输出可能会将标准输出硬塞到日志系统中,并可能捕获到例如日志文件。

Choose the 'urllib3' logger not 'requests.packages.urllib3'

选择“ urllib3”记录器而不是“ requests.packages.urllib3

To capture urllib3debug information through the Python 3 loggingsystem, contrary to much advice on the internet, and as @MikeSmith points out, you won't have much luck intercepting:

urllib3通过 Python 3logging系统捕获调试信息,与互联网上的许多建议相反,正如@MikeSmith 指出的那样,您将不会有太多运气拦截:

log = logging.getLogger('requests.packages.urllib3')

instead you need to:

相反,您需要:

log = logging.getLogger('urllib3')

Debugging urllib3to a log file

调试urllib3到日志文件

Here is some code which logs urllib3workings to a log file using the Python loggingsystem:

下面是一些urllib3使用 Pythonlogging系统将工作记录到日志文件的代码:

import requests
import logging
from http.client import HTTPConnection  # py3

# log = logging.getLogger('requests.packages.urllib3')  # useless
log = logging.getLogger('urllib3')  # works

log.setLevel(logging.DEBUG)  # needed
fh = logging.FileHandler("requests.log")
log.addHandler(fh)

requests.get('http://httpbin.org/')

the result:

结果:

Starting new HTTP connection (1): httpbin.org:80
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168

Enabling the HTTPConnection.debuglevelprint() statements

启用HTTPConnection.debuglevelprint() 语句

If you set HTTPConnection.debuglevel = 1

如果你设置 HTTPConnection.debuglevel = 1

from http.client import HTTPConnection  # py3
HTTPConnection.debuglevel = 1
requests.get('http://httpbin.org/')

you'll get the printstatement output of additional juicy low level info:

您将获得额外多汁低级信息的打印语句输出:

send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python- 
requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin 
header: Content-Encoding header: Content-Type header: Date header: ...

Remember this output uses printand not the Python loggingsystem, and thus cannot be captured using a traditional loggingstream or file handler (though it may be possible to capture output to a file by redirecting stdout).

请记住,此输出使用print而不是 Pythonlogging系统,因此无法使用传统的logging流或文件处理程序捕获(尽管可以通过重定向 stdout 将输出捕获到文件)

Combine the two above - maximise all possible logging to console

结合以上两者 - 最大化所有可能的日志记录到控制台

To maximise all possible logging, you must settle for console/stdout output with this:

为了最大化所有可能的日志记录,您必须使用以下命令解决控制台/标准输出:

import requests
import logging
from http.client import HTTPConnection  # py3

log = logging.getLogger('urllib3')
log.setLevel(logging.DEBUG)

# logging from urllib3 to console
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
log.addHandler(ch)

# print statements from `http.client.HTTPConnection` to console/stdout
HTTPConnection.debuglevel = 1

requests.get('http://httpbin.org/')

giving the full range of output:

提供全方位的输出:

Starting new HTTP connection (1): httpbin.org:80
send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin 
header: Content-Encoding header: ...

回答by saaj

Having a script or even a subsystem of an application for a network protocol debugging, it's desired to see what request-response pairs are exactly, including effective URLs, headers, payloads and the status. And it's typically impractical to instrument individual requests all over the place. At the same time there are performance considerations that suggest using single (or few specialised) requests.Session, so the following assumes that the suggestionis followed.

拥有用于网络协议调试的脚本甚至应用程序子系统,需要查看请求-响应对的确切含义,包括有效的 URL、标头、有效负载和状态。并且到处检测个人请求通常是不切实际的。同时有性能考虑建议使用 single (或少数专业)requests.Session,因此以下假设遵循该建议

requestssupports so called event hooks(as of 2.23 there's actually only responsehook). It's basically an event listener, and the event is emitted before returning control from requests.request. At this moment both request and response are fully defined, hence can be logged.

requests支持所谓的事件挂钩(从 2.23 开始,实际上只有response挂钩)。它基本上是一个事件侦听器,在从requests.request. 此时请求和响应都已完全定义,因此可以记录。

import logging

import requests


logger = logging.getLogger('httplogger')

def logRoundtrip(response, *args, **kwargs):
    extra = {'req': response.request, 'res': response}
    logger.debug('HTTP roundtrip', extra=extra)

session = requests.Session()
session.hooks['response'].append(logRoundtrip)

That's basically how to log all HTTP round-trips of a session.

这基本上是如何记录会话的所有 HTTP 往返。

Formatting HTTP round-trip log records

格式化 HTTP 往返日志记录

For the logging above to be useful there can be specialised logging formatterthat understands reqand resextras on logging records. It can look like this:

为了使上面的日志记录有用,可以有专门的日志格式化程序来理解日志记录req并提供res附加功能。它看起来像这样:

import textwrap

class HttpFormatter(logging.Formatter):   

    def _formatHeaders(self, d):
        return '\n'.join(f'{k}: {v}' for k, v in d.items())

    def formatMessage(self, record):
        result = super().formatMessage(record)
        if record.name == 'httplogger':
            result += textwrap.dedent('''
                ---------------- request ----------------
                {req.method} {req.url}
                {reqhdrs}

                {req.body}
                ---------------- response ----------------
                {res.status_code} {res.reason} {res.url}
                {reshdrs}

                {res.text}
            ''').format(
                req=record.req,
                res=record.res,
                reqhdrs=self._formatHeaders(record.req.headers),
                reshdrs=self._formatHeaders(record.res.headers),
            )

        return result

formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logging.basicConfig(level=logging.DEBUG, handlers=[handler])

Now if you do some requests using the session, like:

现在,如果您使用 执行一些请求session,例如:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')

The output to stderrwill look as follows.

输出stderr将如下所示。

2020-05-14 22:10:13,224 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): httpbin.org:443
2020-05-14 22:10:13,695 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
2020-05-14 22:10:13,698 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/user-agent
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/user-agent
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: application/json
Content-Length: 45
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

{
  "user-agent": "python-requests/2.23.0"
}


2020-05-14 22:10:13,814 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
2020-05-14 22:10:13,818 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/status/200
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/status/200
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

A GUI way

GUI方式

When you have a lot of queries, having a simple UI and a way to filter records comes at handy. I'll show to use Chronologerfor that (which I'm the author of).

当您有很多查询时,拥有一个简单的 UI 和一种过滤记录的方法就派上用场了。我将展示如何使用Chronologer(我是其作者)。

First, the hook has be rewritten to produce records that loggingcan serialise when sending over the wire. It can look like this:

首先,钩子已被重写以生成logging可以在通过线路发送时序列化的记录。它看起来像这样:

def logRoundtrip(response, *args, **kwargs): 
    extra = {
        'req': {
            'method': response.request.method,
            'url': response.request.url,
            'headers': response.request.headers,
            'body': response.request.body,
        }, 
        'res': {
            'code': response.status_code,
            'reason': response.reason,
            'url': response.url,
            'headers': response.headers,
            'body': response.text
        },
    }
    logger.debug('HTTP roundtrip', extra=extra)

session = requests.Session()
session.hooks['response'].append(logRoundtrip)

Second, logging configuration has to be adapted to use logging.handlers.HTTPHandler(which Chronologer understands).

其次,日志配置必须适应使用logging.handlers.HTTPHandler(Chronologer 理解)。

import logging.handlers

chrono = logging.handlers.HTTPHandler(
  'localhost:8080', '/api/v1/record', 'POST', credentials=('logger', ''))
handlers = [logging.StreamHandler(), chrono]
logging.basicConfig(level=logging.DEBUG, handlers=handlers)

Finally, run Chronologer instance. e.g. using Docker:

最后,运行 Chronologer 实例。例如使用 Docker:

docker run --rm -it -p 8080:8080 -v /tmp/db \
    -e CHRONOLOGER_STORAGE_DSN=sqlite:////tmp/db/chrono.sqlite \
    -e CHRONOLOGER_SECRET=example \
    -e CHRONOLOGER_ROLES="basic-reader query-reader writer" \
    saaj/chronologer \
    python -m chronologer -e production serve -u www-data -g www-data -m

And run the requests again:

并再次运行请求:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')

The stream handler will produce:

流处理程序将产生:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org:443
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
DEBUG:httplogger:HTTP roundtrip
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
DEBUG:httplogger:HTTP roundtrip

Now if you open http://localhost:8080/(use "logger" for username and empty password for the basic auth popup) and click "Open" button, you should see something like:

现在,如果您打开http://localhost:8080/(用户名使用“logger”,基本身份验证弹出窗口使用空密码)并单击“打开”按钮,您应该看到如下内容:

Screenshot of Chronologer

计时器的屏幕截图