windows 在线程通信中,消息队列相对于共享数据的优势是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7117300/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 17:53:56  来源:igfitidea点击:

what's the advantage of message queue over shared data in thread communication?

c++windowsmultithreadingmessage-queue

提问by Jason

I read a article about multithread program design http://drdobbs.com/architecture-and-design/215900465, it says it's a best practice that "replacing shared data with asynchronous messages. As much as possible, prefer to keep each thread's data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data".

我读了一篇关于多线程程序设计的文章http://drdobbs.com/architecture-and-design/215900465,它说最好的做法是“用异步消息替换共享数据。尽可能多地,更喜欢保留每个线程的数据隔离(非共享),并让线程通过传递数据副本的异步消息进行通信”。

What confuse me is that I don't see the difference between using shared data and message queues. I am now working on a non-gui project on windows, so let's use windows's message queues. and take a tradition producer-consumer problem as a example.

让我感到困惑的是,我没有看到使用共享数据和消息队列之间的区别。我现在在 windows 上做一个非 gui 项目,所以让我们使用 windows 的消息队列。并以传统的生产者-消费者问题为例。

Using shared data, there would be a shared container and a lock guarding the container between the producer thread and the consumer thread. when producer output product, it first wait for the lock and then write something to the container then release the lock.

使用共享数据,在生产者线程和消费者线程之间会有一个共享容器和一个保护容器的锁。当生产者输出产品时,它首先等待锁定,然后向容器写入一些内容,然后释放锁定。

Using message queue, the producer could simply PostThreadMessage without block. and this is the async message's advantage. but I think there must exist some lock guarding the message queue between the two threads, otherwise the data will definitely corrupt. the PostThreadMessage call just hide the details. I don't know whether my guess is right but if it's true, the advantage seems no longer exist,since both two method do the same thing and the only difference is that the system hide the details when using message queues.

使用消息队列,生产者可以简单地 PostThreadMessage 没有阻塞。这就是异步消息的优势。但我认为必须存在一些锁来保护两个线程之间的消息队列,否则数据肯定会损坏。PostThreadMessage 调用只是隐藏了细节。我不知道我的猜测是否正确,但如果是真的,优势似乎不再存在,因为这两种方法都做同样的事情,唯一的区别是系统在使用消息队列时隐藏了细节。

ps. maybe the message queue use a non-blocking containner, but I could use a concurrent container in the former way too. I want to know how the message queue is implemented and is there any performance difference bwtween the two ways?

附:也许消息队列使用非阻塞容器,但我也可以以前一种方式使用并发容器。我想知道消息队列是如何实现的,这两种方式有什么性能差异吗?

updated: I still don't get the concept of async message if the message queue operations are still blocked somewhere else. Correct me if my guess was wrong: when we use shared containers and locks we will block in our own thread. but when using message queues, myself's thread returned immediately, and left the blocking work to some system thread.

更新:如果消息队列操作仍然在其他地方被阻塞,我仍然没有得到异步消息的概念。如果我的猜测有误,请纠正我:当我们使用共享容器和锁时,我们将阻塞在我们自己的线程中。但是在使用消息队列时,我自己的线程立即返回,并将阻塞工作留给了某个系统线程。

回答by Eric Z

Message passingis useful for exchanging smaller amounts of data, because no conflicts need be avoided. It's much easier to implement than is shared memory for intercomputer communication. Also, as you've already noticed, message passing has the advantage that application developers don't need to worry about the details of protections like shared memory.

消息传递对于交换少量数据很有用,因为不需要避免冲突。它比用于计算机间通信的共享内存更容易实现。此外,正如您已经注意到的,消息传递的优点是应用程序开发人员无需担心共享内存等保护细节。

Shared memoryallows maximum speed and convenience of communication, as it can be done at memory speeds when within a computer. Shared memory is usually faster than message passing, as message-passing are typically implemented using system calls and thus require the more time-consuming tasks of kernel intervention. In contrast, in shared-memory systems, system calls are required only to establish shared-memory regions. Once established, all access are treated as normal memory accesses w/o extra assistance from the kernel.

共享内存可实现最大的通信速度和便利性,因为它可以在计算机内以内存速度完成。共享内存通常比消息传递更快,因为消息传递通常使用系统调用来实现,因此需要更耗时的内核干预任务。相比之下,在共享内存系统中,系统调用只需要建立共享内存区域。一旦建立,所有访问都被视为正常的内存访问,无需内核的额外帮助。

Edit: One case that you might want implement your own queue is that there are lots of messages to be produced and consumed, e.g., a logging system. With the implemenetation of PostThreadMessage, its queue capacity is fixed. Messages will most liky get lost if that capacity is exceeded.

编辑:您可能希望实现自己的队列的一种情况是有大量消息要生成和使用,例如日志系统。通过 PostThreadMessage 的实现,它的队列容量是固定的。如果超过该容量,消息将最有可能丢失。

回答by jcoder

Imagine you have 1 thread producing data,and 4 threads processing that data (presumably to make use of a multi core machine). If you have a big global pool of data you are likely to have to lock it when anyof the threads needs access, potentially blocking 3 other threads. As you add more processing threads you increase the chance of a lock having to wait andincrease how many things might have to wait. Eventually adding more threads achieves nothing because all you do is spend more time blocking.

想象一下,您有 1 个线程生成数据,4 个线程处理该数据(大概是为了使用多核机器)。如果你有一个很大的全局数据池,当任何线程需要访问时,你可能不得不锁定它,这可能会阻塞 3 个其他线程。随着您添加更多的处理线程,您会增加锁必须等待的机会,增加可能需要等待的事情的数量。最终添加更多线程一事无成,因为您所做的只是花费更多时间进行阻塞。

If instead you have one thread sending messages into message queues, one for each consumer thread then they can't block each other. You stil have to lock the queue between the producer and consumer threads but as you have a separate queue for each thread you have a separate lock and each thread can't block all the others waiting for data.

相反,如果您有一个线程将消息发送到消息队列中,每个消费者线程一个线程,那么它们就不能相互阻塞。您仍然必须锁定生产者和消费者线程之间的队列,但是由于每个线程都有一个单独的队列,因此您有一个单独的锁,并且每个线程不能阻止所有其他线程等待数据。

If you suddenly get a 32 core machine you can add 20 more processing threads (and queues) and expect that performance will scale fairly linearly unlike the first case where the new threads will just run into each other all the time.

如果您突然获得一台 32 核机器,您可以再添加 20 个处理线程(和队列),并期望性能将线性扩展,这与第一种情况不同,新线程将一直相互运行。

回答by David Weber

I have used a shared memory model where the pointers to the shared memory are managed in a message queue with careful locking. In a sense, this is a hybrid between a message queue and shared memory. This is very when large quantities of data must be passed between threads while retaining the safety of the message queue.

我使用了一个共享内存模型,其中指向共享内存的指针在消息队列中小心锁定。从某种意义上说,这是消息队列和共享内存的混合体。当必须在线程之间传递大量数据同时保持消息队列的安全性时,这是非常有用的。

The entire queue can be packaged in a single C++ class with appropriate locking and the like. The key is that the queue owns the shared storage and takes care of the locking. Producers acquire a lock for input to the queue and receive a pointer to the next available storage chunk (usually an object of some sort), populates it and releases it. The consumer will block until the next shared object has released by the producer. It can then acquire a lock to the storage, process the data and release it back to the pool. In A suitably designed queue can perform multiple producer/multiple consumer operations with great efficiency. Think a Java thread safe (java.util.concurrent.BlockingQueue) semantics but for pointers to storage.

整个队列可以打包在一个带有适当锁定等的 C++ 类中。关键是队列拥有共享存储并负责锁定。生产者获取队列输入的锁,并接收指向下一个可用存储块(通常是某种对象)的指针,填充它并释放它。消费者将阻塞,直到下一个共享对象被生产者释放。然后它可以获取对存储的锁定,处理数据并将其释放回池。在一个设计合理的队列中,可以高效地执行多个生产者/多个消费者操作。考虑 Java 线程安全 (java.util.concurrent.BlockingQueue) 语义,但用于存储指针。

回答by Tamás Szelei

Of course there is "shared data" when you pass messages. After all, the message itself is some sort of data. However, the important distinction is when you pass a message, the consumer will receive a copy.

当然,当你传递消息时有“共享数据”。毕竟,消息本身就是某种数据。但是,重要的区别是当您传递消息时,消费者将收到一个副本

the PostThreadMessage call just hide the details

PostThreadMessage 调用只是隐藏细节

Yes, it does, but being a WINAPI call, you can be reasonably sure that it does it right.

是的,确实如此,但作为一个 WINAPI 调用,您可以合理地确定它是正确的。

I still don't get the concept of async message if the message queue operations are still blocked somewhere else.

如果消息队列操作仍然在其他地方被阻塞,我仍然不明白异步消息的概念。

The advantageis more safety. You have a locking mechanism that is systematically enforced when you are passing a message. You don't even need to think about it, you can't forget to lock. Given that multi-thread bugs are some of the nastiest ones (think of race conditions), this is very important. Message passing is a higher level of abstraction built on locks.

优势是更安全。当您传递消息时,您有一个系统强制执行的锁定机制。连想都不用想,不能忘记上锁。鉴于多线程错误是一些最严重的错误(想想竞争条件),这非常重要。消息传递是建立在锁上的更高级别的抽象。

The disadvantageis that passing large amounts of data would be probably slow. In that case, you need to use need shared memory.

缺点是,通过大量的数据将可能缓慢。在这种情况下,您需要使用需要的共享内存。

For passing state (i.e. worker thread reporting progress to the GUI) the messages are the way to go.

对于传递状态(即工作线程向 GUI 报告进度),消息是要走的路。

回答by Frerich Raabe

It's quite simple (I'm amazed others wrote such length responses!):

这很简单(我很惊讶其他人写了这么长的回复!):

Using a message queue system instead of 'raw' shared data means that you have to get the synchronization (locking/unlocking of resources) right only once, in a central place.

使用消息队列系统而不是“原始”共享数据意味着您必须在一个中心位置仅正确进行一次同步(锁定/解锁资源)。

With a message-based system, you can think in higher terms of "messages" without having to worry about synchronization issues anymore. For what it's worth, it's perfectly possible that a message queue is implemented using shared data internally.

使用基于消息的系统,您可以从“消息”的更高方面进行思考,而不必再担心同步问题。就其价值而言,消息队列完全有可能在内部使用共享数据实现。

回答by Torp

I think this is the key piece of info there: "As much as possible, prefer to keep each thread's data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data". I.e. use producer-consumer :)
You can do your own message passing or use something provided by the OS. That's an implementation detail (needs to be done right ofc). The key is to avoid shared data, as in having the same region of memory modified by multiple threads. This can cause hard to find bugs, and even if the code is perfect it will eat performance because of all the locking.

我认为这是那里的关键信息:“尽可能地保持每个线程的数据隔离(非共享),并让线程通过传递数据副本的异步消息进行通信”。即使用生产者-消费者 :)
你可以做自己的消息传递或使用操作系统提供的东西。这是一个实现细节(需要立即完成)。关键是避免共享数据,因为多个线程修改了相同的内存区域。这会导致难以发现错误,即使代码是完美的,也会因为所有的锁定而影响性能。

回答by Helin Wang

I had exact the same question. After reading the answers. I feel:

我有完全相同的问题。阅读答案后。我觉得:

  1. in most typical use case, queue = async, shared memory (locks) = sync. Indeed, you can do a async version of shared memory, but that's more code, similar to reinvent the message passing wheel.

  2. Less code = less bug and more time to focus on other stuff.

  1. 在大多数典型用例中,队列 = 异步,共享内存(锁) = 同步。事实上,你可以做一个共享内存的异步版本,但那是更多的代码,类似于重新发明消息传递轮。

  2. 更少的代码 = 更少的错误和更多的时间专注于其他事情。

The pros and cons are already mentioned by previous answers so I will not repeat.

优点和缺点在前面的答案中已经提到,所以我不会重复。