javascript 异步并行请求按顺序运行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32442426/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-28 15:17:08  来源:igfitidea点击:

Async parallel requests are running sequentially

javascriptnode.jsasynchronousparallel-processing

提问by oliversm

I am running a server using Node.js and need to request data from another server that I am running (localhost:3001). I need to make many requests (~200) to the data server and collect the data (response sizes vary from ~20Kb to ~20Mb). Each request is independent, and I would like to save the responses as one giant array of the form:

我正在使用 Node.js 运行服务器,并且需要从我正在运行的另一台服务器请求数据 ( localhost:3001)。我需要向数据服务器发出许多请求(~200)并收集数据(响应大小从~20Kb 到~20Mb 不等)。每个请求都是独立的,我想将响应保存为一个巨大的表单数组:

[{"urlAAA": responseAAA}, {"urlCCC": responseCCC}, {"urlBBB": responseBBB}, etc ]

Notice that the order of the items in unimportant, they should ideally fill the array in the order that the data becomes available.

请注意,项目的顺序不重要,它们最好按照数据可用的顺序填充数组。

var express = require('express');
var router = express.Router();
var async = require("async");
var papa = require("papaparse");
var sync_request = require('sync-request');
var request = require("request");

var pinnacle_data = {};
var lookup_list = [];
for (var i = 0; i < 20; i++) {
    lookup_list.push(i);
}

function write_delayed_files(object, key, value) {
    object[key] = value;
    return;
}

var show_file = function (file_number) {
    var file_index = Math.round(Math.random() * 495) + 1;
    var pinnacle_file_index = 'http://localhost:3001/generate?file=' + file_index.toString();
    var response_json = sync_request('GET', pinnacle_file_index);
    var pinnacle_json = JSON.parse(response_json.getBody('utf8'));
    var object_key = "file_" + file_number.toString();
    pinnacle_data[object_key] = pinnacle_json;
    console.log("We've handled file:    " + file_number);
    return;
};

async.each(lookup_list, show_file, function (err) {});



console.log(pinnacle_data);

/* GET contact us page. */
router.get('/', function (req, res, next) {
    res.render('predictionsWtaLinks', {title: 'Async Trial'});
});

module.exports = router;

Now when this program is run it displays:

现在当这个程序运行时,它显示:

We've handled file:    0
We've handled file:    1
We've handled file:    2
We've handled file:    3
We've handled file:    4
We've handled file:    5
etc

Now as the files are of such variable size I was expecting that this would perform the requests "in parallel", but it seems to perform them sequentially, which is what I was trying to avoid through using async.each(). Currently it takes about 1-2s to connect to the data server and so to perform this over many files is taking too long.

现在,由于文件大小可变,我期望这将“并行”执行请求,但它似乎是按顺序执行的,这是我试图通过使用async.each(). 目前连接到数据服务器大约需要 1-2 秒,因此在许多文件上执行此操作花费的时间太长。

I realise I am using synchronous requesting, and so would like to ideally replace:

我意识到我正在使用同步请求,因此想理想地替换:

var response_json = sync_request('GET', pinnacle_file_index);

with something similar to

与类似的东西

request(pinnacle_file_index, function (error, response, body) {
    if (!error && response.statusCode == 200) {
        pinnacle_data[object_key] = JSON.parse(body);
    }
});

Any help would be much appreciated.

任何帮助将非常感激。

Additionally I have looked at trying:

此外,我还尝试过:

  • Converting the list of urls into a list of anonymous functions and using async.parallel(function_list, function (err, results) { //add results to pinnacle_data[]});. (I have encountered problems trying to define unique functions for each element in the array).
  • 将 url 列表转换为匿名函数列表并使用async.parallel(function_list, function (err, results) { //add results to pinnacle_data[]});. (我在尝试为数组中的每个元素定义唯一函数时遇到了问题)。

Similarly I have looked at other related topics:

同样,我查看了其他相关主题:

EDIT - WORKING SOLUTION

编辑 - 工作解决方案



The following code now does the task (taking ~80ms per request, including having to make repeated requests using npm requestretry). Similarly this scales very well, taking an average request time of ~80ms for making between 5 request in total, up to 1000.

下面的代码现在完成任务(每个请求需要大约 80 毫秒,包括必须使用 重复请求npm requestretry)。类似地,这可以很好地扩展,平均请求时间约为 80 毫秒,总共发出 5 个请求,最多 1000 个。

var performance = require("performance-now");
var time_start = performance();
var async = require("async");
var request_retry = require('requestretry');

var lookup_list = [];
var total_requests = 50;
for (var i = 0; i < total_requests; i++) {
    lookup_list.push(i);
}

var pinnacle_data = {};
async.map(lookup_list, function (item, callback) {
        var file_index = Math.round(Math.random() * 495) + 1;
        var pinnacle_file_index = 'http://localhost:3001/generate?file=' + file_index;
        request_retry({
                url: pinnacle_file_index,
                maxAttempts: 20,
                retryDelay: 20,
                retryStrategy: request_retry.RetryStrategies.HTTPOrNetworkError
            },
            function (error, response, body) {
                if (!error && response.statusCode == 200) {
                    body = JSON.parse(body);
                    var data_array = {};
                    data_array[file_index.toString()] = body;
                    callback(null, data_array);
                } else {
                    console.log(error);
                    callback(error || response.statusCode);
                }
            });
    },
    function (err, results) {
        var time_finish = performance();
        console.log("It took " + (time_finish - time_start).toFixed(3) + "ms to complete " + total_requests + " requests.");
        console.log("This gives an average rate of " + ((time_finish - time_start) / total_requests).toFixed(3) + " ms/request");
        if (!err) {
            for (var i = 0; i < results.length; i++) {
                for (key in results[i]) {
                    pinnacle_data[key] = results[i][key];
                }
            }
            var length_array = Object.keys(pinnacle_data).length.toString();
            console.log("We've got all the data, totalling " + length_array + " unique entries.");
        } else {
            console.log("We had an error somewhere.");
        }
    });

Thanks for the help.

谢谢您的帮助。

回答by jfriend00

As you have discovered, async.parallel()can only parallelize operations that are themselves asynchronous. If the operations are synchronous, then because of the single threaded nature of node.js, the operations will run one after another, not in parallel. But, if the operations are themselves asynchronous, then async.parallel()(or other async methods) will start them all at once and coordinate the results for you.

正如您所发现的,async.parallel()只能并行化本身是异步的操作。如果操作是同步的,那么由于 node.js 的单线程特性,操作将一个接一个地运行,而不是并行运行。但是,如果操作本身是异步的,那么async.parallel()(或其他异步方法)将立即启动它们并为您协调结果。

Here's a general idea using async.map(). I used async.map()because the idea there is that it takes an array as input and produces an array of results in the same order as the original, but runs all the requests in parallel which seems to line up with what you want:

这是使用async.map(). 我使用async.map()是因为那里的想法是它将一个数组作为输入并以与原始顺序相同的顺序生成一个结果数组,但并行运行所有请求,这似乎与您想要的一致:

var async = require("async");
var request = require("request");

// create list of URLs
var lookup_list = [];
for (var i = 0; i < 20; i++) {
    var index = Math.round(Math.random() * 495) + 1;
    var url = 'http://localhost:3001/generate?file=' + index;
    lookup_list.push(url);
}

async.map(lookup_list, function(url, callback) {
    // iterator function
    request(url, function (error, response, body) {
        if (!error && response.statusCode == 200) {
            var body = JSON.parse(body);
            // do any further processing of the data here
            callback(null, body);
        } else {
            callback(error || response.statusCode);
        }
    });
}, function(err, results) {
    // completion function
    if (!err) {
        // process all results in the array here
        console.log(results);
        for (var i = 0; i < results.length; i++) {
            // do something with results[i]
        }
    } else {
        // handle error here
    }
});


And, here's a version using Bluebird promises and somewhat similarly using Promise.map()to iterate the initial array:

而且,这是一个使用 Bluebird 承诺的版本,有点类似地Promise.map()用于迭代初始数组:

var Promise = require("bluebird");
var request = Promise.promisifyAll(require("request"), {multiArgs: true});

// create list of URLs
var lookup_list = [];
for (var i = 0; i < 20; i++) {
    var index = Math.round(Math.random() * 495) + 1;
    var url = 'http://localhost:3001/generate?file=' + index;
    lookup_list.push(url);
}

Promise.map(lookup_list, function(url) {
    return request.getAsync(url).spread(function(response, body) {
        if response.statusCode !== 200) {
            throw response.statusCode;
        }
        return JSON.parse(body);
    });
}).then(function(results) {
    console.log(results);
    for (var i = 0; i < results.length; i++) {
        // process results[i] here
    }
}, function(err) {
    // process error here
});

回答by caasjj

Sounds like you're just trying to download a bunch of URLs in parallel. This will do that:

听起来您只是想并行下载一堆 URL。这将做到这一点:

var request = require('request');
var async = require('async');

var urls = ['http://microsoft.com', 'http://yahoo.com', 'http://google.com', 'http://amazon.com'];

var loaders = urls.map( function(url) {
  return function(callback) {
        request(url, callback);
  }
});

async.parallel(loaders, function(err, results) {
        if (err) throw(err); // ... handle appropriately
        // results will be an array of the results, in 
        // the same order as 'urls', even thought the operation
        // was done in parallel
        console.log(results.length); // == urls.length
});

or even simpler, using async.map:

甚至更简单,使用async.map

var request = require('request');
var async = require('async');

var urls = ['http://microsoft.com', 'http://yahoo.com', 'http://google.com', 'http://amazon.com'];

async.map(urls, request, function(err, results) {
        if (err) throw(err);          // handle error 
        console.log(results.length);  // == urls.length
});

回答by cshion

Try this:

试试这个:

var async = require("async");
var request = require("request");
var show_file = function (file_number,cb) {
    //..Sync ops
     var file_index = Math.round(Math.random() * 495) + 1;
     var pinnacle_file_index = 'http://localhost:3001/generate?file='+file_index.toString();
    //request instance from Request npm Module
    //..Async op --> this should make async.each asynchronous
    request(pinnacle_file_index, function (error, response, body) {
       if(error)
           return cb(error);
       var object_key = "file_" + file_number.toString();
      pinnacle_data[object_key] = JSON.parse(body);
      return cb();
    });
};

async.each(
  lookup_list, 
  show_file,
  function (err) {
    if(err){
       console.log("Error",err);
    }else{
       console.log("Its ok");
       console.log(pinnacle_data);
   }
});