Node.js 请求模块获取 ETIMEDOUT 和 ESOCKETTIMEDOUT

Question

提问by Jorayen

I'm crawling a lot of links with the requestmodule in parallel with combination of the asyncmodule.
I'm noticing alot of ETIMEDOUTand ESOCKETTIMEDOUTerrors although the links are reachable and respond fairly quickly using chrome.

我正在与异步模块的组合并行抓取请求模块的许多链接。我注意到很多的和错误虽然链接访问，并响应相当迅速使用Chrome。
ETIMEDOUTESOCKETTIMEDOUT

I've limit the maxSocketsto 2 and the timeoutto 10000 in the request options. I'm using async.filterLimit()with a limit of 2 to even cut down the parallelism to 2 request each time. So I have 2 sockets, 2 request, and a timeout of 10 seconds to wait for headers response from the server yet I get these errors.

我已将请求选项中的限制maxSockets为 2 和timeout10000。我使用async.filterLimit()的限制为 2，甚至每次将并行度减少到 2 个请求。所以我有 2 个套接字、2 个请求和 10 秒的超时等待来自服务器的标头响应，但我收到了这些错误。

Here;s request configuration I use:

这是我使用的请求配置：

{
    ...
    pool: {
        maxSockets: 2
    },
    timeout: 10000
    ,
    time: true
    ...
}

Here's the snippet of code I use to fecth links:

这是我用来创建链接的代码片段：

var self = this;
async.filterLimit(resources, 2, function(resource, callback) {
    request({
        uri: resource.uri
    }, function (error, response, body) {
        if (!error && response.statusCode === 200) {
            ...
        } else {
            self.emit('error', resource, error);
        }
        callback(...);
    })
}, function(result) {
    callback(null, result);
});

I listened to the error event and I see whenever the error code is ETIMEDOUTthe connect object is either true/false so sometimes it's a connection timeout and sometimes it's not (according to request docs)

我听了错误事件，我看到每当错误代码是ETIMEDOUT连接对象是真/假所以有时是连接超时有时不是（根据请求文档）

UPDATE:I decided to boost up the maxSocketsto Infinityso no connection will be hangup due to lack of available sockets:

更新：我决定提高maxSocketstoInfinity所以没有连接会因为缺乏可用的套接字而挂断：

pool: {
    maxSockets: Infinity
}

In-order to control the bandwidth I implemented a requestLoopmethod that handle the request with a maxAttempsand retryDelayparameters to control the requests:

为了控制带宽，我实现了一种requestLoop方法，该方法使用 amaxAttemps和retryDelay参数来处理请求以控制请求：

async.filterLimit(resources, 10, function(resource, callback) {
    self.requestLoop({
        uri: resource.uri
    }, 100, 5000, function (error, response, body) {
            var fetched = false;
            if (!error) {
                ...
            } else {
                ....
            }
            callback(...);
        });
}, function(result) {
    callback(null, result);
});

Implementation of requestLoop:

requestLoop 的实现：

requestLoop = function(options, attemptsLeft, retryDelay, callback, lastError) {
    var self = this;
    if (attemptsLeft <= 0) {
        callback((lastError != null ? lastError : new Error('...')));
    } else {
        request(options, function (error, response, body) {
            var recoverableErrors = ['ESOCKETTIMEDOUT', 'ETIMEDOUT', 'ECONNRESET', 'ECONNREFUSED'];
            var e;
            if ((error && _.contains(recoverableErrors, error.code)) || (response && (500 <= response.statusCode && response.statusCode < 600))) {
                e = error ? new Error('...');
                e.code = error ? error.code : response.statusCode;
                setTimeout((function () {
                    self.requestLoop(options, --attemptsLeft, retryDelay, callback, e);
                }), retryDelay);
            } else if (!error && (200 <= response.statusCode && response.statusCode < 300)) {
                callback(null, response, body);
            } else if (error) {
                e = new Error('...');
                e.code = error.code;
                callback(e);
            } else {
                e = new Error('...');
                e.code = response.statusCode;
                callback(e);
            }
        });
    }
};

So this to sum it up: - Boosted maxSocketsto Infinityto try overcome timeout error of sockets connection - Implemnted requestLoopmethod to control failed request and maxAttempsas well as retryDelayof such requests - Also there's maxium number of concurrent request set by the number passed to async.filterLimit

所以总结一下： -maxSockets提升Infinity以尝试克服套接字连接的超时错误 - 实现requestLoop方法来控制失败的请求maxAttemps以及retryDelay此类请求 - 还有由传递给的数量设置的最大并发请求数async.filterLimit

I want to note that I've also played with the settings of everything here in-order to get errors free crawling but so far attempts failed as-well.

我想指出的是，我还尝试了此处所有内容的设置，以便获得无错误的爬行，但到目前为止尝试也失败了。

Still looking for help about solving this problem.

仍在寻找有关解决此问题的帮助。

UPDATE2:I've decided to drop async.filterLimit and make my own limit mechanism. I just have 3 variables to help me achieve this:
pendingRequests- a request array which will hold all requests (will explain later) activeRequests- number of active requests maxConcurrentRequests- number of maximum allowed concurrent requests

UPDATE2：我决定放弃 async.filterLimit 并制作我自己的限制机制。我只有 3 个变量来帮助我实现这一点：
pendingRequests- 一个请求数组，它将保存所有请求（稍后将解释） activeRequests- 活动请求 maxConcurrentRequests数 - 允许的最大并发请求数

into the pendingRequests array, i push a complex object containing a reference to the requestLoop function as well as arguments array containing the arguments to be passed to the loop function:

到 pendingRequests 数组中，我推送一个复杂对象，其中包含对 requestLoop 函数的引用以及包含要传递给循环函数的参数的参数数组：

self.pendingRequests.push({
    "arguments": [{
        uri: resource.uri.toString()
    }, self.maxAttempts, function (error, response, body) {
        if (!error) {
            if (self.policyChecker.isMimeTypeAllowed((response.headers['content-type'] || '').split(';')[0]) &&
                self.policyChecker.isFileSizeAllowed(body)) {
                self.totalBytesFetched += body.length;
                resource.content = self.decodeBuffer(body, response.headers["content-type"] || '', resource);
                callback(null, resource);
            } else {
                self.fetchedUris.splice(self.fetchedUris.indexOf(resource.uri.toString()), 1);
                callback(new Error('Fetch failed because a mime-type is not allowed or file size is bigger than permited'));
            }
        } else {
            self.fetchedUris.splice(self.fetchedUris.indexOf(resource.uri.toString()), 1);
            callback(error);
        }
        self.activeRequests--;
        self.runRequest();
    }],
    "function": self.requestLoop
});
self.runRequest();

You'' notice the call to runRequest()at the end. This function job is to manage the requests and fire requests when it can while keeping the maximum activeRequestsunder the limit of maxConcurrentRequests:

你会注意到最后的电话runRequest()。此功能工作是管理请求并尽可能触发请求，同时将最大值保持在activeRequests以下限制以下maxConcurrentRequests：

var self = this;
process.nextTick(function() {
    var next;
    if (!self.pendingRequests.length || self.activeRequests >= self.maxConcurrentRequests) {
        return;
    }
    self.activeRequests++;
    next = self.pendingRequests.shift();
    next["function"].apply(self, next["arguments"]);
    self.runRequest();
});

This should solve any Timeouts errors, through my testings tho, I've still noticed some timeouts in specific websites I've tested this on. I can't be 100% sure about this, but I'm thinking it's due to the nature of the website backing http-server limiting a user requests to a maximum by doing an ip-checking and as a result returning some HTTP 400 messages to prevent a possible 'attack' on the server.

这应该可以解决任何超时错误，通过我的测试，我仍然注意到我测试过的特定网站中的一些超时。我不能 100% 确定这一点，但我认为这是由于支持 http 服务器的网站的性质通过进行 ip 检查并因此返回一些 HTTP 400 消息将用户请求限制到最大值以防止对服务器可能的“攻击”。

Answer 1

回答by Motiejus Jak?tys

Edit: duplicate of https://stackoverflow.com/a/37946324/744276

编辑：https://stackoverflow.com/a/37946324/744276 的副本

By default, Node has 4 workers to resolve DNS queries. If your DNS query takes long-ish time, requests will block on the DNS phase, and the symptom is exactly ESOCKETTIMEDOUTor ETIMEDOUT.

默认情况下，Node 有4 个工作人员来解析 DNS 查询。如果您的 DNS 查询需要很长时间，请求将在 DNS 阶段阻塞，并且症状正好是ESOCKETTIMEDOUT或ETIMEDOUT。

Try increasing your uv thread pool size:

尝试增加您的 uv 线程池大小：

export UV_THREADPOOL_SIZE=128
node ...

or in index.js(or wherever your entry point is):

或在index.js（或您的入口点所在的任何地方）：

#!/usr/bin/env node
process.env.UV_THREADPOOL_SIZE = 128;

function main() {
   ...
}

Edit: I also wrote a blog postabout it.

编辑：我还写了一篇关于它的博客文章。

Answer 2

回答by cancerbero

I found if there are too many async requests, then ESOCKETTIMEDOUT exception happens in linux. The workaround I've found is doing this:

我发现如果异步请求太多，那么在 linux 中会发生 ESOCKETTIMEDOUT 异常。我发现的解决方法是这样做：

setting this options to request(): agent: false, pool: {maxSockets: 100}Notice that after that, the timeout can be lying so you might need to increase it.

将此选项设置为 request()： agent: false, pool: {maxSockets: 100}请注意，在那之后，超时可能是谎言，因此您可能需要增加它。

Node.js 请求模块获取 ETIMEDOUT 和 ESOCKETTIMEDOUT

提问by Jorayen

回答by Motiejus Jak?tys

回答by cancerbero

相关推荐

最近更新

标签

Node.js 请求模块获取 ETIMEDOUT 和 ESOCKETTIMEDOUT

提问by Jorayen

回答by Motiejus Jak?tys

回答by cancerbero

相关推荐

node.js 从意图中获取 Alexa Slot 值

node.js 使用 npm 运行 bash 脚本

node.js NPM 卡住给出相同的错误 EISDIR: Illegal operation on a directory, read at error (native)

node.js 在 ubuntu 14.04 中安装最新的 nodejs 版本

相关推荐

最近更新

标签