javascript 在 Chrome 中加载页面时,如何捕获所有网络请求和完整响应数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52969381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 10:03:50  来源:igfitidea点击:

How can I capture all network requests and full response data when loading a page in Chrome?

javascriptgoogle-chromepuppeteer

提问by Matt Zeunert

Using Puppeteer, I'd like to load a URL in Chrome and capture the following information:

使用 Puppeteer,我想在 Chrome 中加载一个 URL 并捕获以下信息:

  • request URL
  • request headers
  • request post data
  • response headers text (including duplicate headers like set-cookie)
  • transferred response size (i.e. compressed size)
  • full response body
  • 请求网址
  • 请求头
  • 请求发布数据
  • 响应标头文本(包括重复的标头,如set-cookie
  • 传输的响应大小(即压缩大小)
  • 完整的响应体

Capturing the full response body is what causes the problems for me.

捕获完整的响应主体是导致我出现问题的原因。

Things I've tried:

我尝试过的事情:

  • Getting response content with response.buffer- this does not work if there are redirects at any point, since buffers are wiped on navigation
  • intercepting requests and using getResponseBodyForInterception- this means I can no longer access the encodedLength, and I also had problems getting the correct request and response headers in some cases
  • Using a local proxy works, but this slowed down page load times significantly (and also changed some behavior for e.g. certificate errors)
  • 获取响应内容response.buffer- 如果在任何时候有重定向,这将不起作用,因为缓冲区在导航时被擦除
  • 拦截请求并使用getResponseBodyForInterception- 这意味着我无法再访问 encodingLength,并且在某些情况下我也无法获取正确的请求和响应标头
  • 使用本地代理有效,但这会显着减慢页面加载时间(并且还更改了某些行为,例如证书错误)

Ideally the solution should only have a minor performance impact and have no functional differences from loading a page normally. I would also like to avoid forking Chrome.

理想情况下,该解决方案应该只对性能产生较小的影响,并且与正常加载页面没有功能差异。我也想避免分叉 Chrome。

回答by Grant Miller

You can enable a request interception with page.setRequestInterception()for each request, and then, inside page.on('request'), you can use the request-promise-nativemodule to act as a middle man to gather the response data before continuing the request with request.continue()in Puppeteer.

您可以page.setRequestInterception()为每个请求启用一个请求拦截,然后在内部page.on('request'),您可以使用该request-promise-native模块作为中间人收集响应数据,然后request.continue()在 Puppeteer 中继续请求。

Here's a full working example:

这是一个完整的工作示例:

'use strict';

const puppeteer = require('puppeteer');
const request_client = require('request-promise-native');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  const result = [];

  await page.setRequestInterception(true);

  page.on('request', request => {
    request_client({
      uri: request.url(),
      resolveWithFullResponse: true,
    }).then(response => {
      const request_url = request.url();
      const request_headers = request.headers();
      const request_post_data = request.postData();
      const response_headers = response.headers;
      const response_size = response_headers['content-length'];
      const response_body = response.body;

      result.push({
        request_url,
        request_headers,
        request_post_data,
        response_headers,
        response_size,
        response_body,
      });

      console.log(result);
      request.continue();
    }).catch(error => {
      console.error(error);
      request.abort();
    });
  });

  await page.goto('https://example.com/', {
    waitUntil: 'networkidle0',
  });

  await browser.close();
})();

回答by Thomas Dondorf

Puppeteer-only solution

仅限 Puppeteer 的解决方案

This can be done with puppeteer alone. The problem you are describing that the response.bufferis cleared on navigation, can be circumvented by processing each request one after another.

这可以单独使用 puppeteer 来完成。您描述的问题response.buffer是在导航时被清除,可以通过一个接一个地处理每个请求来规避。

How it works

怎么运行的

The code below uses page.setRequestInterceptionto intercept all requests. If there is currently a request being processed/being waited for, new requests are put into a queue. Then, response.buffer()can be used without the problem that other requests might asynchronously wipe the buffer as there are no parallel requests. As soon as the currently processed request/response is handled, the next request will be processed.

下面的代码page.setRequestInterception用于拦截所有请求。如果当前有正在处理/正在等待的请求,则将新请求放入队列中。然后,response.buffer()可以在没有其他请求可能异步擦除缓冲区的问题的情况下使用,因为没有并行请求。一旦处理了当前处理的请求/响应,就会处理下一个请求。

Code

代码

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    const results = []; // collects all results

    let paused = false;
    let pausedRequests = [];

    const nextRequest = () => { // continue the next request or "unpause"
        if (pausedRequests.length === 0) {
            paused = false;
        } else {
            // continue first request in "queue"
            (pausedRequests.shift())(); // calls the request.continue function
        }
    };

    await page.setRequestInterception(true);
    page.on('request', request => {
        if (paused) {
            pausedRequests.push(() => request.continue());
        } else {
            paused = true; // pause, as we are processing a request now
            request.continue();
        }
    });

    page.on('requestfinished', async (request) => {
        const response = await request.response();

        const responseHeaders = response.headers();
        let responseBody;
        if (request.redirectChain().length === 0) {
            // body can only be access for non-redirect responses
            responseBody = await response.buffer();
        }

        const information = {
            url: request.url(),
            requestHeaders: request.headers(),
            requestPostData: request.postData(),
            responseHeaders: responseHeaders,
            responseSize: responseHeaders['content-length'],
            responseBody,
        };
        results.push(information);

        nextRequest(); // continue with next request
    });
    page.on('requestfailed', (request) => {
        // handle failed request
        nextRequest();
    });

    await page.goto('...', { waitUntil: 'networkidle0' });
    console.log(results);

    await browser.close();
})();

回答by Andrii Muzalevskyi

I would suggest you to search for a quick proxy server which allows to write requests logs together with actual content.

我建议您搜索一个允许将请求日志与实际内容一起写入的快速代理服务器。

The target setup is to allow proxy server to just write a log file, and then analyze the log, searching for information you need.

目标设置是让代理服务器只写一个日志文件,然后分析日志,搜索你需要的信息。

Don't intercept requests while proxy is working (this will lead to slow down)

不要在代理工作时拦截请求(这会导致速度变慢)

The performance issues(with proxy as logger setup) you may encounter are mostly related to TLS support, please pay attention to allow quick TLS handshake, HTTP2 protocol in the proxy setup

您可能遇到的性能问题(使用代理作为记录器设置)主要与 TLS 支持有关,请注意在代理设置中允许快速 TLS 握手、HTTP2 协议

E.g. Squid benchmarksshow that it is able to process hundreds RPS, which should be enough for testing purposes

例如Squid 基准测试表明它能够处理数百个 RPS,这对于测试目的来说应该足够了

回答by ScrapCode

I would suggest using a tool namely 'fiddler'. It will capture all the information that you mentioned when you load a URL url.

我建议使用一种工具,即“提琴手”。它将捕获您在加载 URL url 时提到的所有信息。

回答by Jose Rodriguez

go to Chrome press F12, then go to "network" tab, you can see there all the http request that the website sends, yo're be able to see the details you mentioned.

转到Chrome按F12,然后转到“网络”选项卡,您可以在那里看到该网站发送的所有http请求,您可以看到您提到的详细信息。