java 多线程总是比单线程产生更好的性能吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27319446/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 11:36:38  来源:igfitidea点击:

Does multithreading always yield better performance than single threading?

javamultithreading

提问by Jaskey

I know the answer is No, here is an example Why single thread is faster than multithreading in Java?.

我知道答案是否定的,这里有一个例子为什么单线程比 Java 中的多线程快?.

So when processing a task in a thread is trivial, the cost of creating a thread will create more overhead than distributing the task. This is one case where a single thread will be faster than multithreading.

因此,当在线程中处理任务微不足道时,创建线程的成本将比分配任务产生更多的开销。这是单线程比多线程更快的一种情况。

Questions

问题

  • Are there more cases where a single thread will be faster than multithreading?

  • When should we decide to give up multithreading and only use a single thread to accomplish our goal?

  • 是否有更多情况下单线程比多线程更快?

  • 我们什么时候应该决定放弃多线程而只使用单线程来实现我们的目标?

Although the question is tagged java, it is also welcome to discuss beyond Java. It would be great if we could have a small example to explain in the answer.

虽然这个问题被标记为java,但也欢迎在 Java 之外讨论。如果我们能在答案中用一个小例子来解释,那就太好了。

回答by Jens-Peter Haack

This is a very good question regarding threading and its link to the real work, meaning the available physical CPU(s) and its cores and hyperthreads.

这是关于线程及其与实际工作的链接的一个很好的问题,这意味着可用的物理 CPU 及其内核和超线程。

  1. Multiple threads might allow you to do things in parallel, if your CPU has more than one core available. So in an ideal world, e.g. calulating some primes, might be 4 times faster using 4 threads, if your CPU has 4 cores available and your algorithm work really parallel.
  2. If you start more threads as cores are available, the thread management of your OS will spend more and more time in Thread-Switches and in such your effiency using your CPU(s) becomes worse.
  3. If the compiler, CPU cache and/or runtime realized that you run more than one thread, accessing the same data-area in memory, is operates in a different optimization mode: As long as the compile/runtime is sure that only one thread access the data, is can avoid writing data out to extenral RAM too often and might efficently use the L1 cache of your CPU. If not: Is has to activate semaphores and also flush cached data more often from L1/L2 cache to RAM.
  1. 如果您的 CPU 有多个可用内核,则多个线程可能允许您并行执行操作。因此,在理想世界中,例如计算一些素数,如果您的 CPU 有 4 个可用内核并且您的算法真正并行工作,则使用 4 个线程可能会快 4 倍。
  2. 如果您在内核可用时启动更多线程,则操作系统的线程管理将在线程切换上花费越来越多的时间,因此您使用 CPU 的效率会变得更糟。
  3. 如果编译器、CPU 缓存和/或运行时意识到您运行多个线程,访问内存中的相同数据区,则以不同的优化模式运行:只要编译/运行时确定只有一个线程访问数据,可以避免过于频繁地将数据写入外部 RAM,并且可以有效地使用 CPU 的 L1 缓存。如果不是:必须激活信号量,并且还需要更频繁地将缓存数据从 L1/L2 缓存刷新到 RAM。

So my lessons learned from highly parrallel multithreading have been:

所以我从高度并行的多线程中学到的教训是:

  • If possible use single threaded, shared-nothing processes to be more efficient
  • If threads are required, decouple the shared data access as much as possible
  • Don't try to allocate more loaded worker threads than available cores if possible
  • 如果可能,请使用单线程、无共享进程以提高效率
  • 如果需要线程,尽可能解耦共享数据访问
  • 如果可能,不要尝试分配比可用内核更多的加载工作线程

Here a small programm (javafx) to play with. It:

这里有一个小程序(javafx)可以玩。它:

  • Allocates a byte array of 100.000.000 size, filled with random bytes
  • Provides a method, counting the number of bits set in this array
  • The method allow to count every 'nth' bytes bits
  • count(0,1) will count all bytes bits
  • count(0,4) will count the 0', 4', 8' byte bits allowing a parallel interleaved counting
  • 分配一个大小为 100.000.000 的字节数组,填充随机字节
  • 提供一种方法,计算这个数组中设置的位数
  • 该方法允许计算每“第n”个字节位
  • count(0,1) 将计算所有字节位
  • count(0,4) 将计算 0', 4', 8' 字节位,允许并行交错计数

Using a MacPro (4 cores) results in:

使用 MacPro(4 核)会导致:

  1. Running one thread, count(0,1) needs 1326ms to count all 399993625 bits
  2. Running two threads, count(0,2) and count(1,2) in parallel needs 920ms
  3. Running four threads, needs 618ms
  4. Running eight threads, needs 631ms
  1. 运行一个线程,count(0,1) 需要 1326ms 来计算所有 399993625 位
  2. 并行运行两个线程,count(0,2) 和 count(1,2) 需要 920 毫秒
  3. 运行四个线程,需要 618ms
  4. 运行 8 个线程,需要 631ms

enter image description hereenter image description hereenter image description hereenter image description here

在此处输入图片说明在此处输入图片说明在此处输入图片说明在此处输入图片说明

Changing the way to count, e.g. incrementing a commonly shared integer (AtomicInteger or synchronized) will dramatically change the performance of many threads.

改变计数方式,例如增加一个共同共享的整数(AtomicInteger 或 synchronized)将极大地改变许多线程的性能。

public class MulithreadingEffects extends Application {
    static class ParallelProgressBar extends ProgressBar {
        AtomicInteger myDoneCount = new AtomicInteger();
        int           myTotalCount;
        Timeline      myWhatcher = new Timeline(new KeyFrame(Duration.millis(10), e -> update()));
        BooleanProperty running = new SimpleBooleanProperty(false);

        public void update() {
            setProgress(1.0*myDoneCount.get()/myTotalCount);
            if (myDoneCount.get() >= myTotalCount) {
                myWhatcher.stop();
                myTotalCount = 0;
                running.set(false);
            }
        }

        public boolean isRunning() { return myTotalCount > 0; }
        public BooleanProperty runningProperty() { return running; }

        public void start(int totalCount) {
            myDoneCount.set(0);
            myTotalCount = totalCount;
            setProgress(0.0);
            myWhatcher.setCycleCount(Timeline.INDEFINITE);
            myWhatcher.play();
            running.set(true);
        }

        public void add(int n) {
            myDoneCount.addAndGet(n);
        }
    }

    int mySize = 100000000;
    byte[] inData = new byte[mySize];
    ParallelProgressBar globalProgressBar = new ParallelProgressBar();
    BooleanProperty iamReady = new SimpleBooleanProperty(false);
    AtomicInteger myCounter = new AtomicInteger(0);

    void count(int start, int step) {
        new Thread(""+start){
            public void run() {
                int count = 0;
                int loops = 0;
                for (int i = start; i < mySize; i+=step) {
                    for (int m = 0x80; m > 0; m >>=1) {
                        if ((inData[i] & m) > 0) count++;
                    }
                    if (loops++ > 99) {
                        globalProgressBar.add(loops);
                        loops = 0;
                    }
                }
                myCounter.addAndGet(count);
                globalProgressBar.add(loops);
            }
        }.start();
    }

    void pcount(Label result, int n) {
        result.setText("("+n+")");
        globalProgressBar.start(mySize);
        long start = System.currentTimeMillis();
        myCounter.set(0);
        globalProgressBar.runningProperty().addListener((p,o,v) -> {
            if (!v) {
                long ms = System.currentTimeMillis()-start;
                result.setText(""+ms+" ms ("+myCounter.get()+")");
            }
        });
        for (int t = 0; t < n; t++) count(t, n);
    }

    void testParallel(VBox box) {
        HBox hbox = new HBox();

        Label result = new Label("-");
        for (int i : new int[]{1, 2, 4, 8}) {
            Button run = new Button(""+i);
            run.setOnAction( e -> {
                if (globalProgressBar.isRunning()) return;
                pcount(result, i);
            });
            hbox.getChildren().add(run);
        }

        hbox.getChildren().addAll(result);
        box.getChildren().addAll(globalProgressBar, hbox);
    }


    @Override
    public void start(Stage primaryStage) throws Exception {        
        primaryStage.setTitle("ProgressBar's");

        globalProgressBar.start(mySize);
        new Thread("Prepare"){
            public void run() {
                iamReady.set(false);
                Random random = new Random();
                random.setSeed(4711);
                for (int i = 0; i < mySize; i++) {
                    inData[i] = (byte)random.nextInt(256);
                    globalProgressBar.add(1);
                }
                iamReady.set(true);
            }
        }.start();

        VBox box = new VBox();
        Scene scene = new Scene(box,400,80,Color.WHITE);
        primaryStage.setScene(scene);

        testParallel(box);
        GUIHelper.allowImageDrag(box);

        primaryStage.show();   
    }

    public static void main(String[] args) { launch(args); }
}

回答by kapex

Not all algorithms can be processed in parallel (algorithms that are strictly sequential; where P=0 in Amdahl's law) or at least not efficiently (see P-complete). Other algorithms are more suitable for parallel execution (extreme cases are called "embarrassingly parallel").

并非所有算法都可以并行处理(严格顺序的算法;其中阿姆达尔定律中的 P=0 )或至少不能有效地处理(参见P-complete)。其他算法更适合并行执行(极端情况称为“尴尬并行”)。

A naive implementation of a parallel algorithm can be less efficient in terms of complexity or space compared to a similar sequential algorithm. If there is no obvious way to parallelize an algorithm so that it will get a speedup, you may need to choose another similar parallel algorithm that solves the same problem but can be more or less efficient. If you ignore thread/process creation and direct inter-process communication overhead, there can still be other limiting factors when using shared resources like IO bottlenecks or increased paging caused by higher memory consumption.

与类似的顺序算法相比,并行算法的简单实现在复杂性或空间方面可能效率较低。如果没有明显的方法可以并行化算法以提高速度,则您可能需要选择另一种类似的并行算法来解决相同的问题,但效率可能或多或少。如果您忽略线程/进程创建和直接进程间通信开销,则在使用共享资源时仍可能存在其他限制因素,例如 IO 瓶颈或更高的内存消耗导致的分页增加。

When should we decide to give up multithreading and only use a single thread to accomplish our goal?

我们什么时候应该决定放弃多线程而只使用单线程来实现我们的目标?

When deciding between single and multithreading, the time needed to change the implementation and the added complexity for developers should be taken into account. If there is only small gain by using multiple threads you could argue that the increased maintenance cost that are usually caused by multi-threaded applications are not worth the speedup.

在决定单线程还是多线程时,应考虑更改实现所需的时间以及开发人员增加的复杂性。如果使用多线程只有很小的收益,您可能会争辩说,通常由多线程应用程序引起的维护成本增加不值得加速。

回答by alain

As already mentionened in a comment by @Jim Mischel, you can use

正如@Jim Mischel 在评论中已经提到的,您可以使用

Amdahl's law

阿姆达尔定律

to calculate this. Amdahl's law states that the speedup gained from adding processors to solve a task is

来计算这个。阿姆达尔定律指出,通过添加处理器来解决任务所获得的加速是

enter image description here

在此处输入图片说明

where

在哪里

Nis the number of processors, and

N是处理器的数量,并且

Pis the fraction of the code that can be executed in parallel (0 .. 1)

P是可以并行执行的代码的分数 (0 .. 1)

Now if Tis the time it takes to execute the task on a single processor, and Ois the total 'overhead' time (create and set up a second thread, communication, ...), a single thread is faster if

现在,如果T是在单个处理器上执行任务所需的时间,O是总的“开销”时间(创建和设置第二个线程、通信等),如果

T < T/S(2) + O

T < T/S(2) + O

or, after reordering, if

或者,重新排序后,如果

O/T > P/2

O/T > P/2

When the ratio Overhead / Execution Timeis greater than P/2, a single thread is faster.

Overhead / Execution Time比值大于P/2 时,单个线程更快。

回答by Brandon

Threading is about taking advantage of idle resources to handle more work. If you have no idle resources, multi-threading has no advantages, so the overhead would actually make your overall runtime longer.

线程就是利用空闲资源来处理更多的工作。如果你没有空闲资源,多线程没有优势,所以开销实际上会使你的整体运行时间更长。

For example, if you have a collection of tasks to perform and they are CPU-intensive calculations. If you have a single CPU, multi-threading probably wouldn't speed that process up (though you never know until you test). I would expect it to slow down slightly. You are changing how the work is split up, but no changes in capacity. If you have 4 tasks to do on a single CPU, doing them serially is 1 * 4. If you do them in parallel, you'll come out to basically 4 * 1, which is the same. Plus, the overhead of merging results and context switching.

例如,如果您有一组任务要执行并且它们是 CPU 密集型计算。如果您只有一个 CPU,多线程可能不会加速该过程(尽管您在测试之前永远不会知道)。我预计它会稍微放慢速度。您正在更改工作的拆分方式,但容量没有变化。如果您在单个 CPU 上有 4 个任务要完成,那么按顺序执行它们是1 * 4. 如果您并行执行它们,您将得出基本4 * 1,这是相同的。另外,合并结果和上下文切换的开销。

Now, if you have multiple CPU's, then running CPU-intensive tasks in multiple threads would allow you to tap unused resources, so more gets done per unit time.

现在,如果您有多个 CPU,那么在多个线程中运行 CPU 密集型任务将允许您挖掘未使用的资源,从而在单位时间内完成更多工作。

Also, think about other resources. If you have 4 tasks which query a database, running them in parallel helps if the database has extra resources to handle them all. Though, you are also adding more work, which removes resources from the database server, so I probably wouldn't do that.

另外,请考虑其他资源。如果您有 4 个查询数据库的任务,如果数据库有额外的资源来处理它们,并行运行它们会有所帮助。但是,您还添加了更多工作,这会从数据库服务器中删除资源,因此我可能不会这样做。

Now, let's say we need to make web service calls to 3 external systems and none of the calls have input dependent on each other. Doing them in parallel with multiple threads means that we don't have to wait for one to end before the other starts. It also means that running them in parallel won't negatively impact each task. This would be a great use case for multi-threading.

现在,假设我们需要对 3 个外部系统进行 Web 服务调用,并且这些调用都没有相互依赖的输入。与多个线程并行执行它们意味着我们不必在另一个线程开始之前等待一个线程结束。这也意味着并行运行它们不会对每个任务产生负面影响。这将是多线程的一个很好的用例。

回答by loshad vtapkah

The overhead may be not only for creation, but for thread-intercommunications. The other thing that should be noted that synchronization of threads on a, for example, single object may lead to alike single thread execution.

开销可能不仅用于创建,还用于线程间通信。应该注意的另一件事是,例如,单个对象上的线程同步可能会导致类似的单线程执行。

回答by Carlos Bribiescas

Are there more cases where a single thread will be faster than multithreading?

是否有更多情况下单线程比多线程更快?

So in a GUI application you will benefit from multithreading. At the most basic level you will be updating the front end as well as what the front end is presenting. If you're running something basic like hello world then like you showed it would be more overhead.

因此,在 GUI 应用程序中,您将受益于多线程。在最基本的层面上,您将更新前端以及前端呈现的内容。如果您正在运行诸如 hello world 之类的基本程序,那么就像您展示的那样,开销会更大。

That question is very broad... Do you count Unit Tests as applications? If so then there are probably more applications that use single threads because any complex system will have (hopefully) at least 1 unit test. Do you count every Hello world style program as a different application or the same? If an application is deleted does it still count?

这个问题很广泛……你把单元测试算作应用程序吗?如果是这样,那么可能会有更多的应用程序使用单线程,因为任何复杂的系统都会(希望)至少有 1 个单元测试。您将每个 Hello world 风格的程序视为不同的应用程序还是相同的应用程序?如果应用程序被删除,它仍然有效吗?

As you can see I can't give a good response other than you would have to narrow the scope of your question to get a meaningful answer. That being said this may be a statistic out there that I'm unaware of.

正如您所看到的,除了您必须缩小问题的范围以获得有意义的答案之外,我无法给出好的答复。话虽如此,这可能是我不知道的统计数据。

When should we decide to give up multithreading and only use a single thread to accomplish our goal?

我们什么时候应该决定放弃多线程而只使用单线程来实现我们的目标?

When multithreading will perform 'better' by whatever metric you think is important.

根据您认为重要的任何指标,多线程何时会执行“更好”。

Can your problem be broken into parts that can be processed simultaneously? Not in a contrived way like breaking Hello World into two threads where one thread waits on the other to print. But in a way that 2+ threads would be able to accomplish the task more efficiently than one?

您的问题能否分解成可以同时处理的部分?不像将 Hello World 分成两个线程那样人为的方式,一个线程等待另一个线程打印。但是以某种方式,2 个以上的线程能够比一个线程更有效地完成任务?

Even if a task is easily parallelizable doesn't mean that it should be. I could multithread an application that trolled thousands of new sites constantly to get me my news. For me personally this would suck because it would eat my pipe up and I wouldn't be able to get my FPS in. For CNN this might be exactly what they want and will build a mainframe to accommodate it.

即使一个任务很容易并行化并不意味着它应该是。我可以对一个应用程序进行多线程处理,该应用程序不断地跟踪数千个新站点以获取我的新闻。对我个人来说,这会很糟糕,因为它会耗尽我的管道,我将无法获得 FPS。对于 CNN 来说,这可能正是他们想要的,并且会构建一个大型机来容纳它。

Can you narrow your questions?

你能缩小你的问题吗?

回答by sunil bhardwaj

There are 2 scenarios that can happen here :

这里可能发生两种情况:

  1. MultiThreading on Single Core CPU :

    1.1 When to use :Multithreading helps when tasks that needs parallelism are IO bound.Threads give up execution while they wait for IO and OS assign the time slice to other waiting threads. Sequential execution do not have the behavior - Multithreads will boost the performance.

    1.2 When Not to Use :When the tasks are not IO bound and mere a calculation of something, you might not want to go for multi threading since thread creation and context switching will negate the gain if any. - Multithreads will have least impact.

  2. MultiThreading in Multi Core CPU :Multi core can run as many threads as the number of core in CPU. This will surely have performance boost. But Running higher number of threads than the available cores will again introduce the thread context switching problem. - Multithreads will surely have impact.

  1. 单核 CPU 上的多线程:

    1.1何时使用:当需要并行性的任务受 IO 限制时,多线程会有所帮助。线程在等待 IO 时放弃执行,操作系统将时间片分配给其他等待线程。顺序执行没有这种行为 -多线程将提高性能。

    1.2何时不使用:当任务不受 IO 限制并且只是计算某些东西时,您可能不想使用多线程,因为线程创建和上下文切换将抵消增益(如果有的话)。-多线程的影响最小。

  2. 多核 CPU 中的多线程多核可以运行与 CPU 中内核数量一样多的线程。这肯定会提高性能。但是运行比可用内核更多的线程将再次引入线程上下文切换问题。-多线程肯定会产生影响。

Be aware :Also there is limit on adding/introducing number of threads in system. More context switches will negate overall gain and application slows down.

请注意:在系统中添加/引入线程数也有限制。更多的上下文切换将否定整体增益和应用程序减慢。