Java 如何编写 Kafka 消费者——单线程 vs 多线程

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50051768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 23:24:25  来源:igfitidea点击:

How to write Kafka consumers - single threaded vs multi threaded

javamultithreadingdeploymentapache-kafkaspring-kafka

提问by user3842182

I have written a single Kafka consumer (using Spring Kafka), that reads from a single topic and is a part of a consumer group. Once a message is consumed, it will perform all downstream operations and move on to the next message offset. I have packaged this as a WAR file and my deployment pipeline pushes this out to a single instance. Using my deployment pipeline, I could potentially deploy this artifact to multiple instances in my deployment pool.

我编写了一个 Kafka 消费者(使用 Spring Kafka),它从单个主题读取并且是消费者组的一部分。一旦消息被消费,它将执行所有下游操作并移动到下一个消息偏移量。我已将其打包为 WAR 文件,并且我的部署管道将其推送到单个实例。使用我的部署管道,我可以潜在地将此工件部署到我的部署池中的多个实例。

However, I am not able to understand the following, when I want multiple consumers as part of my infrastructure -

但是,当我希望多个消费者作为我的基础设施的一部分时,我无法理解以下内容 -

  • I can actually define multiple instances in my deployment pool and have this WAR running on all those instances. This would mean, all of them are listening to the same topic, are a part of the same consumer group and will actually divide the partitions among themselves. The downstream logic will work as is. This works perfectly fine for my use case, however, I am not sure, if this is the optimal approach to follow ?

  • Reading online, I came across resources hereand here, where people are defining a single consumer thread, but internally, creating multiple worker threads. There are also examples where we could define multiple consumer threads that do the downstream logic. Thinking about these approaches and mapping them to deployment environments, we could achieve the same result (as my theoretical solution above could), but with less number of machines.

  • 我实际上可以在我的部署池中定义多个实例,并在所有这些实例上运行这个 WAR。这意味着,他们所有人都在听同一个话题,属于同一个消费者群体,并且实际上会在他们自己之间划分分区。下游逻辑将按原样工作。这对我的用例非常有效,但是,我不确定,这是否是最佳方法?

  • 在线阅读,我在这里这里遇到了资源,人们这里定义了一个消费者线程,但在内部,创建了多个工作线程。还有一些示例,我们可以定义多个执行下游逻辑的消费者线程。考虑这些方法并将它们映射到部署环境,我们可以获得相同的结果(就像我上面的理论解决方案一样),但机器数量更少。

Personally, I think my solution is simple, scalable but might not be optimal, while the second approach might be optimal, but wanted to know your experiences, suggestions or any other metrics / constraints I should consider ? Also, I am thinking with my theoretical solution, I could actually employ bare bones simple machines as Kafka consumers.

就个人而言,我认为我的解决方案简单、可扩展但可能不是最佳的,而第二种方法可能是最佳的,但想知道您的经验、建议或我应该考虑的任何其他指标/限制?此外,我在考虑我的理论解决方案,我实际上可以使用简单的机器作为 Kafka 消费者。

While I know, I haven't posted any code, please let me know if I need to move this question to another forum. If you need specific code examples, I can provide them too, but I didn't think they are important, in the context of my question.

虽然我知道,我还没有发布任何代码,如果我需要将此问题移至另一个论坛,请告诉我。如果您需要特定的代码示例,我也可以提供它们,但在我的问题中,我认为它们并不重要。

采纳答案by Gary Russell

Your existing solution is best. Handing off to another thread will cause problems with offset management. Spring kafka allows you to run multiple threads in each instance, as long as you have enough partitions.

您现有的解决方案是最好的。移交给另一个线程会导致偏移管理出现问题。Spring kafka 允许您在每个实例中运行多个线程,只要您有足够的分区。

回答by Michal Borowiecki

If your current approach works, just stick to it. It's the simple and elegant way to go.

如果您当前的方法有效,请坚持下去。这是一种简单而优雅的方式。

You would only go to approach 2 in case you cannot for some reason increase the number of partitions but need higher level of parallelism. But then you have ordering and race conditions to worry about. If you ever need to go that route, I'd recommend the akka-stream-kafkalibrary, which provides facilities to handle offset commits correctly and to do what you need in parallel and then merge back into a single stream preserving the original ordering, etc. Otherwise, these things are error-prone to do yourself.

如果由于某种原因无法增加分区数量但需要更高级别的并行性,则只能使用方法 2。但是,您需要担心排序和竞争条件。如果您需要走这条路线,我会推荐akka-stream-kafka库,它提供了正确处理偏移提交和并行执行您需要的操作,然后合并回单个流并保留原始顺序的工具,等等。否则这些东西容易出错自己做。