Java 如何确定高延迟网络请求的最佳线程数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19562060/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 18:21:21  来源:igfitidea点击:

How to determine optimal number of threads for high latency network requests?

javamultithreadingnetworkingakka

提问by seawolf

I am writing a utility that must make thousands of network requests. Each request receives only a single, small packet in response (similar to ping), but may take upwards of several seconds to complete. Processing each response completes in one (simple) line of code.

我正在编写一个必须发出数千个网络请求的实用程序。每个请求只收到一个小的响应数据包(类似于 ping),但可能需要几秒钟才能完成。处理每个响应在一行(简单)代码中完成。

The net effect of this is that the computer is not IO-bound, file-system-bound, or CPU-bound, it is only bound by the latency of the responses.

这样做的净效果是计算机不受 IO 限制、文件系统限制或 CPU 限制,它仅受响应延迟的限制。

This is similar to, but notthe same as There is a way to determine the ideal number of threads?and Java best way to determine the optimal number of threads [duplicate]... the primary difference is that I am only bound by latency.

这类似于,但一样有没有办法确定理想的线程数?Java 确定最佳线程数 [重复] 的最佳方法……主要区别在于我只受延迟的约束。

I am using an ExecutorServiceobject to run the threads and a Queue<Future<Integer>>to track threads that need to have results retrieved:

我正在使用一个ExecutorService对象来运行线程,并使用一个对象来Queue<Future<Integer>>跟踪需要检索结果的线程:

ExecutorService executorService = Executors.newFixedThreadPool(threadPoolSize);
Queue<Future<Integer>> futures = new LinkedList<Future<Integer>>();

for (int quad3 = 0 ; quad3 < 256 ; ++quad3) {
    for (int quad4 = 0 ; quad4 < 256 ; ++quad4) {
        byte[] quads = { quad1, quad2, (byte)quad3, (byte)quad4 };
        futures.add(executorService.submit(new RetrieverCallable(quads)));
    }
}

... I then dequeue all the elements in the queue and put the results in the required data structure:

...然后我将队列中的所有元素出列并将结果放入所需的数据结构中:

int[] result = int[65536]
while(!futures.isEmpty()) {
    try {
        results[i] = futures.remove().get();
    } catch (Exception e) {
        addresses[i] = -1;
    }
}

My first question is: Is this a reasonable way to track all the threads? If thread X takes a while to complete, many other threads might finish before X does. Will the thread pool exhaust itself waiting for open slots, or will the ExecutorServiceobject manage the pool in such a way that threads that have completed but not yet been processed be moved out of available slots so that other threads my begin?

我的第一个问题是:这是跟踪所有线程的合理方法吗?如果线程 X 需要一段时间才能完成,许多其他线程可能会在 X 完成之前完成。线程池会耗尽自己等待打开的插槽,还是ExecutorService对象管理池的方式是将已完成但尚未处理的线程移出可用插槽,以便其他线程开始?

My second question is what guidelines can I use for finding the optimal number of threads to make these calls? I don't even know order-of-magnitude guidance here. I know it works pretty well with 256 threads, but seems to take roughly the same overall time with 1024 threads. CPU utilization is hovering around 5%, so that doesn't appear to be an issue. With that large a number of threads, what are all the metrics I should be looking at to compare different numbers? Obviously overall time to process the batch, average time per thread... what else? Is memory an issue here?

我的第二个问题是我可以使用什么准则来找到进行这些调用的最佳线程数?我什至不知道这里的数量级指导。我知道它在 256 个线程下工作得很好,但在 1024 个线程上花费的总时间似乎大致相同。CPU 利用率徘徊在 5% 左右,因此这似乎不是问题。有这么多线程,我应该查看哪些指标来比较不同的数字?显然处理批处理的总时间,每个线程的平均时间......还有什么?内存是这里的问题吗?

采纳答案by Val

It will shock you, but you do not need any threads for I/O(quantitatively, this means 0 threads). It is good that you have studied that multithreading does not multiply your network bandwidth. Now, it is time to know that threads do computation. They are not doing the (high-latency) communication. The communication is performed by a network adapter, which is another process, running really in parallel with with CPU. It is stupid to allocate a thread(see which resources allocated are listed by this gentlemen who claims that you need 1 thread) just to sleep until network adapter finishes its job. You need no threads for I/O = you need 0 threads.

它会让您震惊,但您不需要任何 I/O 线程(从数量上讲,这意味着 0 个线程)。很高兴您研究了多线程不会使您的网络带宽成倍增加。现在,是时候了解线程进行计算了。他们没有进行(高延迟)通信。通信由网络适配器执行,这是另一个进程,与 CPU 真正并行运行。分配一个线程(请参阅这位声称您需要 1 个线程的先生列出已分配的资源只是在网络适配器完成其工作之前休眠是愚蠢的。您不需要 I/O 线程 = 您需要 0 个线程。

It makes sense to allocate the threads for computation to make in parallel with I/O request(s). The amount of threads will depend on the computation-to-communication ratioand limited by the number of cores in your CPU.

为计算分配线程以与 I/O 请求并行执行是有意义的。线程数量取决于计算与通信的比率,受 CPU 内核数的限制

Sorry, I had to say that despite you have certainly implied the commitment to blocking I/O, so many people do not understand this basic thing. Take the advise, use asynchronous I/Oand you'll see that the issue does not exist.

抱歉,我不得不说,尽管您肯定暗示了对阻塞 I/O 的承诺,但很多人并不了解这个基本的东西。接受建议,使用异步 I/O,您会发现问题不存在。

回答by Thomas

An partial answer, but I hope it helps. Yes, memory can be an issue: Java reserves 1 MB of thread stack by default (at least on Linux amd64). So with a few GB of RAM in your box, that limits your thread count to a few thousand.

部分答案,但我希望它有所帮助。是的,内存可能是一个问题:Java 默认保留 1 MB 的线程堆栈(至少在 Linux amd64 上)。因此,如果您的盒子中有几 GB 的 RAM,那么您的线程数将限制在几千个以内。

You can tunethis with a flag like -XX:ThreadStackSize=64. That would give you 64 kB, which is plenty in most situations.

您可以调整与像一个标志这个-XX:ThreadStackSize=64。这将为您提供 64 kB,这在大多数情况下已经足够了。

You could also move away from threading entirely and use epollto respond to incoming responses. This is far more scalable but I have no practical experience with doing this in Java.

您也可以完全摆脱线程处理,并使用 epoll来响应传入的响应。这更具可扩展性,但我没有在 Java 中执行此操作的实际经验。

回答by Andrey Chaschev

Have you considered using Actors?

你考虑过使用Actors吗?

Best practises.

  • Actors should be like nice co-workers: do their job efficiently without bothering everyone else needlessly and avoid hogging resources. Translated to programming this means to process events and generate responses (or more requests) in an event-driven manner. Actors should not block (i.e. passively wait while occupying a Thread) on some external entity—which might be a lock, a network socket, etc.—unless it is unavoidable; in the latter case see below.

最佳实践。

  • 演员应该像好同事一样:高效地完成工作,不要不必要地打扰其他人,避免占用资源。转换为编程,这意味着以事件驱动的方式处理事件并生成响应(或更多请求)。Actor 不应在某些外部实体(可能是锁、网络套接字等)上阻塞(即在占用线程时被动等待),除非这是不可避免的;在后一种情况下,请参见下文。

Sorry, I can't elaborate, because haven't much used this.

抱歉,我不能详细说明,因为我没怎么用过这个。

UPDATE

更新

Answer in Good use case for Akkamight be helpful.
Scala: Why are Actors lightweight?

Akka 的良好用例中的回答可能会有所帮助。
Scala:为什么Actors 是轻量级的?

回答by OldCurmudgeon

As mentioned in one of the linked answers you refer to, Brian Goetzhas covered this well in his article.

正如您在参考的链接答案之一中提到的那样,Brian Goetz在他的文章中很好地介绍了这一点。

He seems to imply that in your situation you would be advised to gather metrics before committing to a thread count.

他似乎暗示在您的情况下,建议您在提交线程计数之前收集指标。

Tuning the pool size

Tuning the size of a thread pool is largely a matter of avoiding two mistakes: having too few threads or too many threads. ...

The optimum size of a thread pool depends on the number of processors available and the nature of the tasks on the work queue. ...

For tasks that may wait for I/O to complete -- for example, a task that reads an HTTP request from a socket -- you will want to increase the pool size beyond the number of available processors, because not all threads will be working at all times. Using profiling, you can estimate the ratio of waiting time (WT) to service time (ST) for a typical request. If we call this ratio WT/ST, for an N-processor system, you'll want to have approximately N*(1+WT/ST) threads to keep the processors fully utilized.

调整池大小

调整线程池的大小主要是为了避免两个错误:线程太少或线程太多。...

线程池的最佳大小取决于可用处理器的数量和工作队列中任务的性质。...

对于可能等待 I/O 完成的任务——例如,从套接字读取 HTTP 请求的任务——您需要将池大小增加到超过可用处理器的数量,因为并非所有线程都在工作每时每刻。使用分析,您可以估计典型请求的等待时间 (WT) 与服务时间 (ST) 的比率。如果我们将此比率称为 WT/ST,对于 N 处理器系统,您将需要大约 N*(1+WT/ST) 个线程来保持处理器的充分利用。

My emphasis.

我的重点。

回答by soru

Pretty sure in the described circumstances, the optimal number of threads is 1. In fact, that is surprisingly often the answer to any quesion of the form 'how many threads should I use'?

在所描述的情况下,可以肯定的是,最佳线程数是 1。事实上,对于“我应该使用多少线程”形式的任何问题,这通常是令人惊讶的答案?

Each additonal thread adds extra overhead in terms of stack (and associated GC roots), context switching and locking. This may or not be measurable: the effor to meaningfully measure it in all target envoronments is non-trivial. In return, there is little scope to provide any benifit, as processing is neither cpu nor io-bound.

每个附加线程都会在堆栈(和相关的 GC 根)、上下文切换和锁定方面增加额外的开销。这可能是可衡量的,也可能不可衡量:在所有目标环境中有意义地衡量它的努力是非常重要的。作为回报,提供任何好处的余地很小,因为处理既不受 CPU 限制,也不受 io 限制。

So less is always better, if only for reasons of risk reduction. And you cant have less than 1.

所以越少越好,即使只是为了降低风险。而且你不能少于1。

回答by Alex Suo

In our high-performance systems, we use the actor model as described by @Andrey Chaschev.

在我们的高性能系统中,我们使用@Andrey Chaschev 所描述的actor 模型。

The no. of optimal threads in your actor model differ with your CPU structure and how many processes (JVMs) do you run per box. Our finding is

没有。演员模型中最佳线程的数量因 CPU 结构以及每个机器运行的进程 (JVM) 数而异。我们的发现是

  1. If you have 1 process only, use total CPU cores - 2.
  2. If you have multiple process, check your CPU structure. We found its good to have no. of threads = no. of cores in a single CPU - e.g. if you have a 4 CPU server each server having 4 cores, then using 4 threads per JVM gives you best performance. After that, always leave at least 1 core to your OS.
  1. 如果您只有 1 个进程,请使用总 CPU 核心数 - 2。
  2. 如果您有多个进程,请检查您的 CPU 结构。我们发现没有它很好。线程数 = 没有。单个 CPU 中的内核数 - 例如,如果您有一个 4 CPU 服务器,每个服务器有 4 个内核,那么每个 JVM 使用 4 个线程可为您提供最佳性能。之后,始终为您的操作系统保留至少 1 个核心。

回答by Alexei Kaigorodov

I assume the desired optimization is the time to process all requests. You said the number of requests is "thousands". Evidently, the fastest way is to issue all requests at once, but this may overflow the network layer. You should determine how many simultaneous connections can network layer bear, and make this number a parameter for your program.

我假设所需的优化是处理所有请求的时间。你说请求的数量是“数千”。显然,最快的方法是一次发出所有请求,但这可能会溢出网络层。您应该确定网络层可以承受多少同时连接,并将这个数字作为您程序的参数。

Then, spending a thread for each request require a lot of memory. You can avoid this using non-blocking sockets. In Java, there are 2 options: NIO1 with selectors, and NIO2 with asynchronous channels. NIO1 is complex, so better find a ready-made library and reuse it. NIO2 is simple but available only since JDK1.7.

然后,为每个请求花费一个线程需要大量内存。您可以使用非阻塞套接字来避免这种情况。在 Java 中,有 2 个选项:带选择器的 NIO1 和带异步通道的 NIO2。NIO1 很复杂,所以最好找一个现成的库并重用它。NIO2 很简单,但仅从 JDK1.7 开始可用。

Processing the responses should be done on a thread pool. I don't think the number of threads in the thread pool greatly affects the overall performance in your case. Just make tuning for thread pool size from 1 to the number of available processors.

处理响应应该在线程池上完成。我不认为线程池中的线程数会极大地影响您的情况的整体性能。只需将线程池大小从 1 调整到可用处理器的数量即可。