javascript 在 Chrome 中加载页面时，如何捕获所有网络请求和完整响应数据？

Question

提问by Matt Zeunert

Using Puppeteer, I'd like to load a URL in Chrome and capture the following information:

使用 Puppeteer，我想在 Chrome 中加载一个 URL 并捕获以下信息：

request URL
request headers
request post data
response headers text (including duplicate headers like set-cookie)
transferred response size (i.e. compressed size)
full response body

请求网址
请求头
请求发布数据
响应标头文本（包括重复的标头，如set-cookie）
传输的响应大小（即压缩大小）
完整的响应体

Capturing the full response body is what causes the problems for me.

捕获完整的响应主体是导致我出现问题的原因。

Things I've tried:

我尝试过的事情：

Getting response content with response.buffer- this does not work if there are redirects at any point, since buffers are wiped on navigation
intercepting requests and using getResponseBodyForInterception- this means I can no longer access the encodedLength, and I also had problems getting the correct request and response headers in some cases
Using a local proxy works, but this slowed down page load times significantly (and also changed some behavior for e.g. certificate errors)

获取响应内容response.buffer- 如果在任何时候有重定向，这将不起作用，因为缓冲区在导航时被擦除
拦截请求并使用getResponseBodyForInterception- 这意味着我无法再访问 encodingLength，并且在某些情况下我也无法获取正确的请求和响应标头
使用本地代理有效，但这会显着减慢页面加载时间（并且还更改了某些行为，例如证书错误）

Ideally the solution should only have a minor performance impact and have no functional differences from loading a page normally. I would also like to avoid forking Chrome.

理想情况下，该解决方案应该只对性能产生较小的影响，并且与正常加载页面没有功能差异。我也想避免分叉 Chrome。

Answer 1

回答by Grant Miller

You can enable a request interception with page.setRequestInterception()for each request, and then, inside page.on('request'), you can use the request-promise-nativemodule to act as a middle man to gather the response data before continuing the request with request.continue()in Puppeteer.

您可以page.setRequestInterception()为每个请求启用一个请求拦截，然后在内部page.on('request')，您可以使用该request-promise-native模块作为中间人收集响应数据，然后request.continue()在 Puppeteer 中继续请求。

Here's a full working example:

这是一个完整的工作示例：

'use strict';

const puppeteer = require('puppeteer');
const request_client = require('request-promise-native');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  const result = [];

  await page.setRequestInterception(true);

  page.on('request', request => {
    request_client({
      uri: request.url(),
      resolveWithFullResponse: true,
    }).then(response => {
      const request_url = request.url();
      const request_headers = request.headers();
      const request_post_data = request.postData();
      const response_headers = response.headers;
      const response_size = response_headers['content-length'];
      const response_body = response.body;

      result.push({
        request_url,
        request_headers,
        request_post_data,
        response_headers,
        response_size,
        response_body,
      });

      console.log(result);
      request.continue();
    }).catch(error => {
      console.error(error);
      request.abort();
    });
  });

  await page.goto('https://example.com/', {
    waitUntil: 'networkidle0',
  });

  await browser.close();
})();

Answer 2

回答by Thomas Dondorf

Puppeteer-only solution

仅限 Puppeteer 的解决方案

This can be done with puppeteer alone. The problem you are describing that the response.bufferis cleared on navigation, can be circumvented by processing each request one after another.

这可以单独使用 puppeteer 来完成。您描述的问题response.buffer是在导航时被清除，可以通过一个接一个地处理每个请求来规避。

How it works

怎么运行的

The code below uses page.setRequestInterceptionto intercept all requests. If there is currently a request being processed/being waited for, new requests are put into a queue. Then, response.buffer()can be used without the problem that other requests might asynchronously wipe the buffer as there are no parallel requests. As soon as the currently processed request/response is handled, the next request will be processed.

下面的代码page.setRequestInterception用于拦截所有请求。如果当前有正在处理/正在等待的请求，则将新请求放入队列中。然后，response.buffer()可以在没有其他请求可能异步擦除缓冲区的问题的情况下使用，因为没有并行请求。一旦处理了当前处理的请求/响应，就会处理下一个请求。

Code

代码

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    const results = []; // collects all results

    let paused = false;
    let pausedRequests = [];

    const nextRequest = () => { // continue the next request or "unpause"
        if (pausedRequests.length === 0) {
            paused = false;
        } else {
            // continue first request in "queue"
            (pausedRequests.shift())(); // calls the request.continue function
        }
    };

    await page.setRequestInterception(true);
    page.on('request', request => {
        if (paused) {
            pausedRequests.push(() => request.continue());
        } else {
            paused = true; // pause, as we are processing a request now
            request.continue();
        }
    });

    page.on('requestfinished', async (request) => {
        const response = await request.response();

        const responseHeaders = response.headers();
        let responseBody;
        if (request.redirectChain().length === 0) {
            // body can only be access for non-redirect responses
            responseBody = await response.buffer();
        }

        const information = {
            url: request.url(),
            requestHeaders: request.headers(),
            requestPostData: request.postData(),
            responseHeaders: responseHeaders,
            responseSize: responseHeaders['content-length'],
            responseBody,
        };
        results.push(information);

        nextRequest(); // continue with next request
    });
    page.on('requestfailed', (request) => {
        // handle failed request
        nextRequest();
    });

    await page.goto('...', { waitUntil: 'networkidle0' });
    console.log(results);

    await browser.close();
})();

Answer 3

回答by Andrii Muzalevskyi

I would suggest you to search for a quick proxy server which allows to write requests logs together with actual content.

我建议您搜索一个允许将请求日志与实际内容一起写入的快速代理服务器。

The target setup is to allow proxy server to just write a log file, and then analyze the log, searching for information you need.

目标设置是让代理服务器只写一个日志文件，然后分析日志，搜索你需要的信息。

Don't intercept requests while proxy is working (this will lead to slow down)

不要在代理工作时拦截请求（这会导致速度变慢）

The performance issues(with proxy as logger setup) you may encounter are mostly related to TLS support, please pay attention to allow quick TLS handshake, HTTP2 protocol in the proxy setup

您可能遇到的性能问题（使用代理作为记录器设置）主要与 TLS 支持有关，请注意在代理设置中允许快速 TLS 握手、HTTP2 协议

E.g. Squid benchmarksshow that it is able to process hundreds RPS, which should be enough for testing purposes

例如Squid 基准测试表明它能够处理数百个 RPS，这对于测试目的来说应该足够了

Answer 4

回答by ScrapCode

I would suggest using a tool namely 'fiddler'. It will capture all the information that you mentioned when you load a URL url.

我建议使用一种工具，即“提琴手”。它将捕获您在加载 URL url 时提到的所有信息。

Answer 5

回答by Jose Rodriguez

go to Chrome press F12, then go to "network" tab, you can see there all the http request that the website sends, yo're be able to see the details you mentioned.

转到Chrome按F12，然后转到“网络”选项卡，您可以在那里看到该网站发送的所有http请求，您可以看到您提到的详细信息。

javascript 在 Chrome 中加载页面时，如何捕获所有网络请求和完整响应数据？

提问by Matt Zeunert

回答by Grant Miller

回答by Thomas Dondorf

Puppeteer-only solution

仅限 Puppeteer 的解决方案

How it works

怎么运行的

Code

代码

回答by Andrii Muzalevskyi

回答by ScrapCode

回答by Jose Rodriguez

相关推荐

最近更新

标签

javascript 在 Chrome 中加载页面时，如何捕获所有网络请求和完整响应数据？

提问by Matt Zeunert

回答by Grant Miller

回答by Thomas Dondorf

Puppeteer-only solution

仅限 Puppeteer 的解决方案

How it works

怎么运行的

Code

代码

回答by Andrii Muzalevskyi

回答by ScrapCode

回答by Jose Rodriguez

相关推荐

javascript 模块构建失败（来自 ./node_modules/babel-loader/lib/index.js）：错误：找不到模块“babel-preset-react”

javascript 如何使用 puppeteer 在页面上下载图像？

Javascript Uncaught TypeError : .split 不是函数

javascript Vuejs 搜索过滤器

相关推荐

最近更新

标签