javascript Web Worker 数量限制

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13574158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 19:11:47  来源:igfitidea点击:

Number of Web Workers Limit

javascript

提问by Bill

PROBLEM

问题

I've discovered that there is a limit on the number of Web Workers that can be spawned by a browser.

我发现浏览器可以生成的 Web Worker 的数量是有限制的。

Example

例子

main HTML / JavaScript

主要的 HTML / JavaScript

<script type="text/javascript">
$(document).ready(function(){
    var workers = new Array();
    var worker_index = 0;
    for (var i=0; i < 25; i++) {
        workers[worker_index] = new Worker('test.worker.js');
        workers[worker_index].onmessage = function(event) {
            $("#debug").append('worker.onmessage i = ' + event.data + "<br>");
        };
        workers[worker_index].postMessage(i); // start the worker.      

        worker_index++;
    }   
});
</head>
<body>
<div id="debug">
</div>

test.worker.js

test.worker.js

self.onmessage = function(event) {
    var i = event.data; 

    self.postMessage(i);
};

This will generate only 20 output lines in the container when using Firefox (version 14.0.1, Windows 7).

使用 Firefox(版本 14.0.1,Windows 7)时,这将仅在容器中生成 20 行输出。

QUESTION

问题

Is there a way around this? The only two ideas I can think of are:

有没有解决的办法?我能想到的唯一两个想法是:

1) Daisy chaining the web workers, i.e., making each web worker spawn the next one

1) 菊花链式连接 web worker,即让每个 web worker 产生下一个

Example:

例子:

<script type="text/javascript">
$(document).ready(function(){
    createWorker(0);
});

function createWorker(i) {

    var worker = new Worker('test.worker.js');
    worker.onmessage = function(event) {
        var index = event.data;

        $("#debug").append('worker.onmessage i = ' + index + "<br>");

        if ( index < 25) {
            index++;
            createWorker(index);
        } 
    };
    worker.postMessage(i); // start the worker.
}
</script>
</head>
<body>
<div id="debug"></div>

2) Limit the number of web workers to a finite number and modify my code to work with that limit (i.e., share the work load across a finite number of web workers) - something like this: http://www.smartjava.org/content/html5-easily-parallelize-jobs-using-web-workers-and-threadpool

2)将网络工作者的数量限制为有限数量并修改我的代码以处理该限制(即,在有限数量的网络工作者之间共享工作负载) - 类似于:http: //www.smartjava.org /content/html5-easily-parallelize-jobs-using-web-workers-and-threadpool

Unfortunately #1 doesn't seem to work (only a finite number of web workers will get spawned on a page load). Are there any other solutions I should consider?

不幸的是,#1 似乎不起作用(只有有限数量的网络工作者会在页面加载时产生)。还有其他我应该考虑的解决方案吗?

回答by Evan Kennedy

Old question, let's revive it! readies epinephrine

老问题,让我们重温一下!准备好肾上腺素

I've been looking into using Web Workers to isolate 3rd party plugins since web workers can't access the host page. I'll help you out with your methods which I'm sure you've solved by now, but this is for teh internetz. Then I'll give some relevant information from my research.

我一直在研究使用 Web Workers 来隔离 3rd 方插件,因为 Web Workers 无法访问主机页面。我会用你的方法帮助你,我相信你现在已经解决了,但这是给 Internetz 的。然后我会从我的研究中提供一些相关信息。

Disclaimer: In the examples that I used your code, I've modified and cleaned the code to provide a full source code without jQuery so that you and others can run it easily. I've also added a timer which alerts the time in ms to execute the code.

免责声明:在我使用您的代码的示例中,我修改并清理了代码以提供不带 jQuery 的完整源代码,以便您和其他人可以轻松运行它。我还添加了一个计时器,它以毫秒为单位提醒执行代码的时间。

In all examples, we reference the following genericWorker.jsfile.

在所有示例中,我们都引用了以下genericWorker.js文件。

genericWorker.js

通用工人.js

self.onmessage = function(event) {
    self.postMessage(event.data);
};

Method 1 (Linear Execution)

方法一(线性执行)

Your first method is nearly working. The reason why it still fails is that you aren't deleting any workers once you finish with them. This means the same result (crashing) will happen, just slower. All you need to fix it is to add worker.terminate();before creating a new worker to remove the old one from memory. Note that this will cause the application to run muchslower as each worker must be created, run, and be destroyed before the next can run.

你的第一种方法几乎奏效了。它仍然失败的原因是您在完成工作后不会删除任何工作人员。这意味着将发生相同的结果(崩溃),只是速度更慢。您需要修复的只是worker.terminate();在创建新工作线程之前添加以从内存中删除旧工作线程。请注意,这将导致应用程序运行得更慢,因为必须在下一个运行之前创建、运行和销毁每个工作程序。

Linear.html

线性.html

<!DOCTYPE html>
<html>
<head>
    <title>Linear</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function createWorker() {
            var worker = new Worker('genericWorker.js');
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                worker.terminate();
                if (index < totalWorkers) createWorker(index);
                else alert((new Date).getTime() - start);
            };
            worker.postMessage(index++); // start the worker.
        }

        createWorker();
    </script>
</body>
<html>

Method 2 (Thread Pool)

方法二(线程池)

Using a thread pool should greatly increase running speed. Instead of using some library with complex lingo, lets simplify it. All the thread pool means is having a set number of workers running simultaneously. We can actually just modify a few lines of code from the linear example to get a multi-threaded example. The code below will find how many cores you have (if your browser supports this), or default to 4. I found that this code ran about 6x faster than the original on my machine with 8 cores.

使用线程池应该会大大提高运行速度。与其使用一些带有复杂术语的库,不如简化它。所有线程池意味着让一定数量的工人同时运行。我们实际上只需修改线性示例中的几行代码即可获得多线程示例。下面的代码会找到你有多少核(如果你的浏览器支持),或者默认为 4。我发现这段代码比我的 8 核机器上的原始代码快 6 倍。

ThreadPool.html

线程池.html

<!DOCTYPE html>
<html>
<head>
    <title>Thread Pool</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var maxWorkers = navigator.hardwareConcurrency || 4;
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function createWorker() {
            var worker = new Worker('genericWorker.js');
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                worker.terminate();
                if (index < totalWorkers) createWorker();
                else if(--maxWorkers === 0) alert((new Date).getTime() - start);
            };
            worker.postMessage(index++); // start the worker.
        }

        for(var i = 0; i < maxWorkers; i++) createWorker();
    </script>
</body>
<html>

Other Methods

其他方法

Method 3 (Single worker, repeated task)

方法三(单worker,重复任务)

In your example, you're using the same worker over and over again. I know you're simplifying a probably more complex use case, but some people viewing will see this and apply this method when they could be using just one worker for all the tasks.

在您的示例中,您一遍又一遍地使用同一个工作人员。我知道您正在简化一个可能更复杂的用例,但有些人会看到这一点,并在他们可能仅使用一名工人完成所有任务时应用此方法。

Essentially, we'll instantiate a worker, send data, wait for data, then repeat the send/wait steps until all data has been processed.

本质上,我们将实例化一个工作线程,发送数据,等待数据,然后重复发送/等待步骤,直到处理完所有数据。

On my computer, this runs at about twice the speed of the thread pool. That actually surprised me. I thought the overhead from the thread pool would have caused it to be slower than just 1/2 the speed.

在我的计算机上,它的运行速度大约是线程池速度的两倍。这实际上让我感到惊讶。我认为线程池的开销会导致它比速度慢 1/2。

RepeatedWorker.html

重复工人.html

<!DOCTYPE html>
<html>
<head>
    <title>Repeated Worker</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();
        var worker = new Worker('genericWorker.js');

        function runWorker() {
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                if (index < totalWorkers) runWorker();
                else {
                    alert((new Date).getTime() - start);
                    worker.terminate();
                }
            };
            worker.postMessage(index++); // start the worker.
        }

        runWorker();
    </script>
</body>
<html>

Method 4 (Repeated Worker w/ Thread Pool)

方法 4(具有线程池的重复工作线程)

Now, what if we combine the previous method with the thread pool method? Theoretically, it should run quicker than the previous. Interestingly, it runs at just about the same speed as the previous on my machine.

现在,如果我们将前面的方法与线程池方法结合起来会怎样?从理论上讲,它应该比以前运行得更快。有趣的是,它的运行速度与我机器上的前一个速度几乎相同。

Maybe it's the extra overhead of sending the worker reference on each time it's called. Maybe it's the extra workers being terminated during execution (only one worker won't be terminated before we get the time). Who knows. Finding this out is a job for another time.

也许这是每次调用时发送工作人员引用的额外开销。也许是在执行过程中终止了额外的工作人员(只有一名工作人员在我们得到时间之前不会被终止)。谁知道。找到这个是另一个时间的工作。

RepeatedThreadPool.html

重复线程池.html

<!DOCTYPE html>
<html>
<head>
    <title>Repeated Thread Pool</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var maxWorkers = navigator.hardwareConcurrency || 4;
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function runWorker(worker) {
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                if (index < totalWorkers) runWorker(worker);
                else {
                    if(--maxWorkers === 0) alert((new Date).getTime() - start);
                    worker.terminate();
                }
            };
            worker.postMessage(index++); // start the worker.
        }

        for(var i = 0; i < maxWorkers; i++) runWorker(new Worker('genericWorker.js'));
    </script>
</body>
<html>

Now for some real world shtuff

现在是一些现实世界的东西

Remember how I said I was using workers to implement 3rd party plugins into my code? These plugins have a state to keep track of. I could start the plugins and hope they don't load too many for the application to crash, orI could keep track of the plugin state within my main thread and send that state back to the plugin if the plugin needs to be reloaded. I like the second one better.

还记得我说过我使用工人在我的代码中实现第 3 方插件吗?这些插件有一个状态需要跟踪。我可以启动插件并希望它们不会加载太多导致应用程序崩溃,或者我可以在我的主线程中跟踪插件状态并在插件需要重新加载时将该状态发送回插件。我更喜欢第二个。

I had written out several more examples of stateful, stateless, and state-restore workers, but I'll spare you the agony and just do some brief explaining and some shorter snippets.

我已经写了几个有状态、无状态和状态恢复工作者的例子,但我不会让你痛苦,只是做一些简短的解释和一些更短的片段。

First-off, a simple stateful worker looks like this:

首先,一个简单的有状态工作者看起来像这样:

StatefulWorker.js

StatefulWorker.js

var i = 0;

self.onmessage = function(e) {
    switch(e.data) {
        case 'increment':
            self.postMessage(++i);
            break;
        case 'decrement':
            self.postMessage(--i);
            break;
    }
};

It does some action based on the message it receives and holds data internally. This is great. It allows for mah plugin devs to have full control over their plugins. The main app instantiates their plugin once, then will send messages for them to do some action.

它根据收到的消息执行一些操作并在内部保存数据。这很棒。它允许 mah 插件开发人员完全控制他们的插件。主应用程序实例化他们的插件一次,然后会发送消息让他们做一些动作。

The problem comes in when we want to load several plugins at once. We can't do that, so what can we do?

当我们想一次加载多个插件时,问题就出现了。我们做不到,那我们能做什么?

Let's think about a few solutions.

让我们考虑几个解决方案。

Solution 1 (Stateless)

解决方案 1(无状态)

Let's make these plugins stateless. Essentially, every time we want to have the plugin do something, our application should instantiate the plugin then send it data based on its old state.

让我们使这些插件无状态。本质上,每次我们想让插件做一些事情时,我们的应用程序应该实例化插件,然后根据它的旧状态向它发送数据。

data sent

发送的数据

{
    action: 'increment',
    value: 7
}

StatelessWorker.js

无状态工人.js

self.onmessage = function(e) {
    switch(e.data.action) {
        case 'increment':
            e.data.value++;
            break;
        case 'decrement':
            e.data.value--;
            break;
    }
    self.postMessage({
        value: e.data.value,
        i: e.data.i
    });
};

This could work, but if we're dealing with a good amount of data this will start to seem like a less-than-perfect solution. Another similar solution could be to have several smaller workers for each plugin and sending only a small amount of data to and from each, but I'm uneasy with that too.

这可以工作,但如果我们处理大量数据,这将开始看起来不太完美。另一个类似的解决方案可能是为每个插件配备几个较小的工作人员,并且只向每个插件发送少量数据,但我也对此感到不安。

Solution 2 (State Restore)

解决方案2(状态恢复)

What if we try to keep the worker in memory as long as possible, but if we do lose it, we can restore its state? We can use some sort of scheduler to see what plugins the user has been using (and maybe some fancy algorithms to guess what the user will use in the future) and keep those in memory.

如果我们尝试尽可能长时间地将 worker 保留在内存中,但如果我们确实丢失了它,我们可以恢复它的状态怎么办?我们可以使用某种调度程序来查看用户一直在使用哪些插件(可能还有一些花哨的算法来猜测用户将来会使用什么)并将它们保存在内存中。

The cool part about this is that we aren't looking at one worker per core anymore. Since most of the time the worker is active will be idle, we just need to worry about the memory it takes up. For a good number of workers (10 to 20 or so), this won't be substantial at all. We can keep the primary plugins loaded while the ones not used as often get switched out as needed. Allthe plugins will still need some sort of state restore.

关于这一点很酷的部分是我们不再考虑每个核心一个工人。由于worker活跃的大部分时间都是空闲的,我们只需要关心它占用的内存。对于大量工人(10 到 20 人左右)来说,这根本不会是实质性的。我们可以保持主要插件加载,而那些不经常使用的插件会根据需要切换出来。所有插件仍然需要某种状态恢复。

Let's use the following worker and assume we either send 'increment', 'decrement', or an integer containing the state it's supposed to be at.

让我们使用以下工作程序并假设我们发送“增量”、“减量”或包含它应该处于的状态的整数。

StateRestoreWorker.js

StateRestoreWorker.js

var i = 0;

self.onmessage = function(e) {
    switch(e.data) {
        case 'increment':
            self.postMessage(++i);
            break;
        case 'decrement':
            self.postMessage(--i);
            break;
        default:
            i = e.data;
    }
};

These are all pretty simple examples, but I hope I helped understand methods of using multiple workers efficiently! I'll most likely be writing a scheduler and optimizer for this stuff, but who knows when I'll get to that point.

这些都是非常简单的例子,但我希望我能帮助理解有效使用多个 worker 的方法!我很可能会为这些东西编写一个调度器和优化器,但谁知道我什么时候会达到那个点。

Good luck, and happy coding!

祝你好运,快乐编码!

回答by Bj?rn Weinbrenner

My experience is that too many workers (> 100) decrease the performance. In my case FF became very slow and Chrome even crashed. I compared variants with different amounts of workers (1, 2, 4, 8, 16, 32). The worker performed an encryption of a string. It turned out that 8 was the optimal amount of workers, but that may differ, depending on the problem the worker has to solve.

我的经验是,太多的工人(> 100)会降低性能。就我而言,FF 变得非常慢,Chrome 甚至崩溃了。我比较了不同数量的工人(1、2、4、8、16、32)的变体。工作人员对字符串进行了加密。结果证明 8 是最佳工人数量,但这可能会有所不同,具体取决于工人必须解决的问题。

I built up a small framework to abstract from the amount of workers. Calls to the workers are created as tasks. If the maximum allowed number of workers is busy, a new task is queued and executed later.

我建立了一个小框架来从工人数量中抽象出来。对工作人员的调用被创建为任务。如果最大允许数量的工作人员忙,则新任务将排队并稍后执行。

It turned out that it's very important to recycle the workers in such an approach. You should hold them in a pool when they are idle, but don't call new Worker(...) too often. Even if the workers are terminated by worker.terminate() it seems that there is a big difference in the performance between creating/terminating and recycling of workers.

事实证明,以这种方式回收工人非常重要。当它们空闲时,您应该将它们保存在池中,但不要过于频繁地调用 new Worker(...)。即使工作人员被 worker.terminate() 终止,似乎在创建/终止和回收工作人员之间的性能上存在很大差异。

回答by nfroidure

The way you're chaining your Workers in the solution #1 impeach the garbage collector to terminate Worker instances because you still have a reference to them in the scope of your onmessage callback function.

您在解决方案 1 中链接工作人员的方式会弹劾垃圾收集器以终止工作人员实例,因为您在 onmessage 回调函数的范围内仍然有对它们的引用。

Give a try with this code:

试试这个代码:

<script type="text/javascript">
var worker;
$(document).ready(function(){
    createWorker(0);
});
function createWorker(i) {
   worker = new Worker('test.worker.js');
   worker.onmessage = handleMessage;
   worker.postMessage(i); // start the worker.
}
function handleMessage(event) {
       var index = event.data;
       $("#debug").append('worker.onmessage i = ' + index + "<br>");

        if ( index < 25) {
            index++;
            createWorker(index);
        } 
    };
</script>
</head>
<body>
<div id="debug"></div>

回答by N Kearns Mills

Old question, but comes up on a search, so... There Is a configurable limit in Firefox. If you look in about:config(put as address in FF's address bar), and search for 'worker', you will see several settings, including this one:

老问题,但出现在搜索中,所以... Firefox 中有一个可配置的限制。如果您查看about:config(在 FF 的地址栏中输入地址),然后搜索“worker”,您将看到几个设置,包括以下设置:

dom.workers.maxPerDomain

Set at 20by default. Double-click the line and change the setting. You will need to restart the browser.

20默认设置为。双击该行并更改设置。您将需要重新启动浏览器。