multithreading 多少线程太多?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/481970/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 01:00:16  来源:igfitidea点击:

How many threads is too many?

multithreadingperformancethreadpool

提问by ryeguy

I am writing a server, and I send each action of into a separate thread when the request is received. I do this because almost every request makes a database query. I am using a threadpool library to cut down on construction/destruction of threads.

我正在编写一个服务器,当收到请求时,我将每个动作发送到一个单独的线程中。我这样做是因为几乎每个请求都会进行数据库查询。我正在使用线程池库来减少线程的构建/销毁。

My question is: what is a good cutoff point for I/O threads like these? I know it would just be a rough estimate, but are we talking hundreds? Thousands?

我的问题是:像这样的 I/O 线程的一个好的截止点是什么?我知道这只是一个粗略的估计,但我们说的是数百个吗?几千?

How would I go about figuring out what this cutoff would be?

我将如何弄清楚这个截止点是什么?



EDIT:

编辑:

Thank you all for your responses, it seems like I am just going to have to test it to find out my thread count ceiling. The question is though: how do I know I've hit that ceiling? What exactly should I measure?

谢谢大家的回复,看来我只需要测试一下就可以找出我的线程数上限了。但问题是:我怎么知道我已经达到了上限?我到底应该测量什么?

采纳答案by paxdiablo

Some people would say that twothreads is too many - I'm not quite in that camp :-)

有些人会说两个线程太多了 - 我不太在那个阵营:-)

Here's my advice: measure, don't guess.One suggestion is to make it configurable and initially set it to 100, then release your software to the wild and monitor what happens.

这是我的建议:衡量,不要猜测。一个建议是使其可配置并最初将其设置为 100,然后将您的软件发布到野外并监控发生的情况。

If your thread usage peaks at 3, then 100 is too much. If it remains at 100 for most of the day, bump it up to 200 and see what happens.

如果您的线程使用率峰值为 3,那么 100 就太多了。如果一天中的大部分时间它都保持在 100,则将其提高到 200,看看会发生什么。

You couldactually have your code itself monitor usage and adjust the configuration for the next time it starts but that's probably overkill.

可能确实有你的代码本身监控使用情况和调整配置启动,但是这可能是矫枉过正的下一次。



For clarification and elaboration:

澄清和详细说明:

I'm not advocating rolling your own thread pooling subsystem, by all means use the one you have. But, since you were asking about a good cut-off point for threads, I assume your thread pool implementation has the ability to limit the maximum number of threads created (which is a good thing).

我不是提倡滚动你自己的线程池子系统,一定要使用你拥有的子系统。但是,由于您问的是线程的一个很好的截止点,我假设您的线程池实现能够限制创建的最大线程数(这是一件好事)。

I've written thread and database connection pooling code and they have the following features (which I believe are essential for performance):

我编写了线程和数据库连接池代码,它们具有以下功能(我认为这对性能至关重要):

  • a minimum number of active threads.
  • a maximum number of threads.
  • shutting down threads that haven't been used for a while.
  • 最小活动线程数。
  • 最大线程数。
  • 关闭一段时间未使用的线程。

The first sets a baseline for minimum performance in terms of the thread pool client (this number of threads is always available for use). The second sets a restriction on resource usage by active threads. The third returns you to the baseline in quiet times so as to minimise resource use.

第一个为线程池客户端方面的最低性能设置基线(此线程数始终可用)。第二个设置对活动线程的资源使用的限制。第三个让您在安静时间返回基线,以最大限度地减少资源使用。

You need to balance the resource usage of having unused threads (A) against the resource usage of not having enough threads to do the work (B).

您需要平衡拥有未使用线程 (A) 的资源使用和没有足够线程来完成工作 (B) 的资源使用。

(A) is generally memory usage (stacks and so on) since a thread doing no work will not be using much of the CPU. (B) will generally be a delay in the processing of requests as they arrive as you need to wait for a thread to become available.

(A) 通常是内存使用量(堆栈等),因为不工作的线程不会使用大量 CPU。(B) 通常会在请求到达时延迟处理,因为您需要等待线程可用。

That's why you measure. As you state, the vast majority of your threads will be waiting for a response from the database so they won't be running. There are two factors that affect how many threads you should allow for.

这就是你测量的原因。正如您所说,绝大多数线程将等待来自数据库的响应,因此它们不会运行。有两个因素会影响您应该允许的线程数。

The first is the number of DB connections available. This may be a hard limit unless you can increase it at the DBMS - I'm going to assume your DBMS can take an unlimited number of connections in this case (although you should ideally be measuring that as well).

第一个是可用的数据库连接数。这可能是一个硬限制,除非您可以在 DBMS 上增加它 - 我将假设您的 DBMS 在这种情况下可以采用无限数量的连接(尽管理想情况下您也应该测量它)。

Then, the number of threads you should have depend on your historical use. The minimum you should have running is the minimum number that you've ever had running + A%, with an absolute minimum of (for example, and make it configurable just like A) 5.

然后,您应该拥有的线程数取决于您的历史使用情况。您应该运行的最小值是您曾经运行过的最小数量 + A%,绝对最小值为(例如,使其像 A 一样可配置)5。

The maximum number of threads should be your historical maximum + B%.

最大线程数应该是您的历史最大值 + B%。

You should also be monitoring for behaviour changes. If, for some reason, your usage goes to 100% of available for a significant time (so that it would affect the performance of clients), you should bump up the maximum allowed until it's once again B% higher.

您还应该监视行为变化。如果由于某种原因,您的使用量在很长一段时间内达到 100% 可用(因此会影响客户端的性能),您应该提高允许的最大值,直到它再次高出 B%。



In response to the "what exactly should I measure?" question:

回应“我到底应该测量什么?” 题:

What you should measure specifically is the maximum amount of threads in concurrent use (e.g., waiting on a return from the DB call) under load. Then add a safety factor of 10% for example(emphasised, since other posters seem to take my examples as fixed recommendations).

您应该具体测量负载下并发使用的最大线程数(例如,等待从 DB 调用返回)。然后添加一个安全系数的10%,例如(强调的,因为其他海报似乎把我的例子为固定的建议)。

In addition, this should be done in the production environment for tuning. It's okay to get an estimate beforehand but you never know what production will throw your way (which is why all these things should be configurable at runtime). This is to catch a situation such as unexpected doubling of the client calls coming in.

此外,这应该在生产环境中完成以进行调优。事先得到一个估计是可以的,但你永远不知道什么生产会给你带来麻烦(这就是为什么所有这些东西都应该在运行时进行配置)。这是为了捕捉诸如传入的客户端调用意外加倍之类的情况。

回答by Jay D

This question has been discussed quite thoroughly and I didn't get a chance to read all the responses. But here's few things to take into consideration while looking at the upper limit on number of simultaneous threads that can co-exist peacefully in a given system.

这个问题已经讨论得很彻底了,我没有机会阅读所有的回复。但是,在查看可以在给定系统中和平共存的并发线程数上限时,需要考虑以下几点。

  1. Thread Stack Size : In Linux the default thread stack size is 8MB (you can use ulimit -a to find it out).
  2. Max Virtual memory that a given OS variant supports. Linux Kernel 2.4 supports a memory address space of 2 GB. with Kernel 2.6 , I a bit bigger (3GB )
  3. [1] shows the calculations for the max number of threads per given Max VM Supported. For 2.4 it turns out to be about 255 threads. for 2.6 the number is a bit larger.
  4. What kindda kernel scheduler you have . Comparing Linux 2.4 kernel scheduler with 2.6 , the later gives you a O(1) scheduling with no dependence upon number of tasks existing in a system while first one is more of a O(n). So also the SMP Capabilities of the kernel schedule also play a good role in max number of sustainable threads in a system.
  1. 线程堆栈大小:在 Linux 中,默认线程堆栈大小为 8MB(您可以使用 ulimit -a 来查找)。
  2. 给定操作系统变体支持的最大虚拟内存。Linux Kernel 2.4 支持 2 GB 的内存地址空间。使用 Kernel 2.6 ,我有点大(3GB)
  3. [1] 显示了每个给定的最大 VM 支持的最大线程数的计算。对于 2.4,结果是大约 255 个线程。对于 2.6,这个数字有点大。
  4. 你有什么样的内核调度程序。将 Linux 2.4 内核调度程序与 2.6 进行比较,后者为您提供 O(1) 调度,不依赖于系统中存在的任务数量,而第一个调度程序更多是 O(n)。因此,内核调度的 SMP 功能也对系统中可持续线程的最大数量起到了很好的作用。

Now you can tune your stack size to incorporate more threads but then you have to take into account the overheads of thread management(creation/destruction and scheduling). You can enforce CPU Affinity to a given process as well as to a given thread to tie them down to specific CPUs to avoid thread migration overheads between the CPUs and avoid cold cash issues.

现在您可以调整堆栈大小以合并更多线程,但您必须考虑线程管理(创建/销毁和调度)的开销。您可以对给定进程和给定线程强制执行 CPU Affinity,以将它们绑定到特定 CPU,以避免 CPU 之间的线程迁移开销并避免冷现金问题。

Note that one can create thousands of threads at his/her wish , but when Linux runs out of VM it just randomly starts killing processes (thus threads). This is to keep the utility profile from being maxed out. (The utility function tells about system wide utility for a given amount of resources. With a constant resources in this case CPU Cycles and Memory, the utility curve flattens out with more and more number of tasks ).

请注意,人们可以根据自己的意愿创建数千个线程,但是当 Linux 耗尽 VM 时,它只是随机开始杀死进程(因此是线程)。这是为了防止实用程序配置文件被最大化。(效用函数说明给定资源量的系统范围效用。在这种情况下,CPU 周期和内存资源恒定,效用曲线随着任务数量的增加而趋于平缓)。

I am sure windows kernel scheduler also does something of this sort to deal with over utilization of the resources

我确信 Windows 内核调度程序也会做这样的事情来处理资源的过度利用

[1] http://adywicaksono.wordpress.com/2007/07/10/i-can-not-create-more-than-255-threads-on-linux-what-is-the-solutions/

[1] http://adywicaksono.wordpress.com/2007/07/10/i-can-not-create-more-than-255-threads-on-linux-what-is-the-solutions/

回答by Andrew Grant

If your threads are performing any kind of resource-intensive work (CPU/Disk) then you'll rarely see benefits beyond one or two, and too many will kill performance very quickly.

如果您的线程正在执行任何类型的资源密集型工作(CPU/磁盘),那么您很少会看到超过一两个的好处,而且太多会很快降低性能。

The 'best-case' is that your later threads will stall while the first ones complete, or some will have low-overhead blocks on resources with low contention. Worst-case is that you start thrashing the cache/disk/network and your overall throughput drops through the floor.

“最好的情况”是您的后续线程将在第一个线程完成时停止,或者某些线程在争用较少的资源上具有低开销块。最坏的情况是,您开始对缓存/磁盘/网络进行颠簸,而您的整体吞吐量却一落千丈。

A good solution is to place requests in a pool that are then dispatched to worker threads from a thread-pool (and yes, avoiding continuous thread creation/destruction is a great first step).

一个好的解决方案是将请求放在一个池中,然后从线程池中分派给工作线程(是的,避免连续的线程创建/销毁是一个很好的第一步)。

The number of active threads in this pool can then be tweaked and scaled based on the findings of your profiling, the hardware you are running on, and other things that may be occurring on the machine.

然后可以根据分析结果、运行的硬件以及机器上可能发生的其他情况调整和扩展此池中的活动线程数。

回答by Chad Okere

One thing you should keep in mind is that python (at least the C based version) uses what's called a global interpreter lockthat can have a huge impact on performance on mult-core machines.

您应该记住的一件事是,python(至少是基于 C 的版本)使用所谓的全局解释器锁,它可以对多核机器的性能产生巨大影响。

If you really need the most out of multithreaded python, you might want to consider using Jython or something.

如果你真的需要多线程 python 的最大好处,你可能需要考虑使用 Jython 或其他东西。

回答by bortzmeyer

As Pax rightly said, measure, don't guess. That what I did for DNSwitnessand the results were suprising: the ideal number of threads was much higher than I thought, something like 15,000 threads to get the fastest results.

正如 Pax 所说的,衡量,不要猜测。我为DNSwitness所做的事情和结果令人惊讶:理想的线程数比我想象的要多得多,大约 15,000 个线程才能获得最快的结果。

Of course, it depends on many things, that's why you must measure yourself.

当然,这取决于很多事情,这就是为什么你必须衡量自己。

Complete measures (in French only) in Combien de fils d'exécution ?.

Combien de fils d'execution 中的完整措施(仅限法语).

回答by Matthew Lund

I've written a number of heavily multi-threaded apps. I generally allow the number of potential threads to be specified by a configuration file. When I've tuned for specific customers, I've set the number high enough that my utilization of the all the CPU cores was pretty high, but not so high that I ran into memory problems (these were 32-bit operating systems at the time).

我编写了许多多线程应用程序。我通常允许由配置文件指定潜在线程的数量。当我针对特定客户进行调优时,我将数字设置得足够高,以至于我对所有 CPU 内核的利用率都非常高,但不会高到我遇到内存问题(这些是 32 位操作系统)时间)。

Put differently, once you reach some bottleneck be it CPU, database throughput, disk throughput, etc, adding more threads won't increase the overall performance. But until you hit that point, add more threads!

换句话说,一旦遇到 CPU、数据库吞吐量、磁盘吞吐量等瓶颈,添加更多线程不会提高整体性能。但在达到这一点之前,请添加更多线程!

Note that this assumes the system(s) in question are dedicated to your app, and you don't have to play nicely (avoid starving) other apps.

请注意,这假设有问题的系统专用于您的应用程序,并且您不必很好地玩(避免饿死)其他应用程序。

回答by Hot Licks

The "big iron" answer is generally one thread per limited resource -- processor (CPU bound), arm (I/O bound), etc -- but that only works if you can route the work to the correct thread for the resource to be accessed.

“大铁”的答案通常是每个有限资源的一个线程——处理器(CPU 绑定)、手臂(I/O 绑定)等——但只有当您可以将工作路由到正确的线程以获取资源时,这才有效被访问。

Where that's not possible, consider that you have fungible resources (CPUs) and non-fungible resources (arms). For CPUs it's not critical to assign each thread to a specific CPU (though it helps with cache management), but for arms, if you can't assign a thread to the arm, you get into queuing theory and what's optimal number to keep arms busy. Generally I'm thinking that if you can't route requests based on the arm used, then having 2-3 threads per arm is going to be about right.

如果这是不可能的,请考虑您拥有可替代的资源 (CPU) 和不可替代的资源(武器)。对于 CPU,将每个线程分配给特定的 CPU 并不重要(尽管它有助于缓存管理),但对于 arm,如果您不能将线程分配给 arm,您将进入排队理论以及保持 arm 的最佳数量忙碌的。一般来说,我认为如果您不能根据所使用的臂路由请求,那么每个臂有 2-3 个线程将是正确的。

A complication comes about when the unit of work passed to the thread doesn't execute a reasonably atomic unit of work. Eg, you may have the thread at one point access the disk, at another point wait on a network. This increases the number of "cracks" where additional threads can get in and do useful work, but it also increases the opportunity for additional threads to pollute each other's caches, etc, and bog the system down.

当传递给线程的工作单元没有执行合理的原子工作单元时,就会出现复杂情况。例如,您可能让线程在某一点访问磁盘,在另一点等待网络。这增加了额外线程可以进入并执行有用工作的“裂缝”数量,但它也增加了额外线程污染彼此缓存等的机会,并使系统陷入困境。

Of course, you must weigh all this against the "weight" of a thread. Unfortunately, most systems have very heavyweight threads (and what they call "lightweight threads" often aren't threads at all), so it's better to err on the low side.

当然,您必须权衡所有这些与线程的“重量”。不幸的是,大多数系统都有非常重量级的线程(他们所谓的“轻量级线程”通常根本不是线程),所以最好在低端出错。

What I've seen in practice is that very subtle differences can make an enormous difference in how many threads are optimal. In particular, cache issues and lock conflicts can greatly limit the amount of practical concurrency.

我在实践中看到的是,非常细微的差异会对最佳线程的数量产生巨大的影响。特别是缓存问题和锁冲突会极大地限制实际并发量。

回答by mmr

I think this is a bit of a dodge to your question, but why not fork them into processes? My understanding of networking (from the hazy days of yore, I don't really code networks at all) was that each incoming connection can be handled as a separate process, because then if someone does something nasty in your process, it doesn't nuke the entire program.

我认为这有点回避您的问题,但为什么不将它们分叉到流程中呢?我对网络的理解(从过去朦胧的日子开始,我根本没有真正编写网络代码)是每个传入的连接都可以作为一个单独的进程来处理,因为如果有人在你的进程中做了一些令人讨厌的事情,它不会核对整个程序。

回答by newdayrising

One thing to consider is how many cores exist on the machine that will be executing the code. That represents a hard limit on how many threads can be proceeding at any given time. However, if, as in your case, threads are expected to be frequently waiting for a database to execute a query, you will probably want to tune your threads based on how many concurrent queries the database can process.

需要考虑的一件事是将执行代码的机器上存在多少个内核。这代表了在任何给定时间可以处理的线程数量的硬限制。但是,如果在您的情况下,线程预计会经常等待数据库执行查询,则您可能希望根据数据库可以处理的并发查询数量来调整线程。

回答by hyperboreean

ryeguy, I am currently developing a similar application and my threads number is set to 15. Unfortunately if I increase it at 20, it crashes. So, yes, I think the best way to handle this is to measure whether or not your current configuration allows more or less than a number X of threads.

ryeguy,我目前正在开发一个类似的应用程序,我的线程数设置为 15。不幸的是,如果我将它增加到 20,它会崩溃。所以,是的,我认为处理这个问题的最好方法是衡量您当前的配置是否允许多于或少于 X 个线程。