Java Storm-Kafka 多个 spout,如何分担负载?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18267834/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 23:53:02  来源:igfitidea点击:

Storm-Kafka multiple spouts, how to share the load?

javaload-balancingapache-stormapache-kafka

提问by Amol M Kulkarni

I am trying to share the task among the multiple spouts. I have a situation, where I'm getting one tuple/message at a time from external source and I want to have multiple instances of a spout, main intention behind is to share the load and increase performance efficiency.

我正在尝试在多个 spout 之间共享任务。我有一种情况,我一次从外部来源获取一个元组/消息,并且我想要一个 spout 的多个实例,背后的主要目的是共享负载并提高性能效率。

I can do the same with one Spout itself, but I want to share the load across multiple spouts. I am not able to get the logic to spread the load. Since the offset of messages will not be known until the particular spout finishes the consuming the part (i.e based on buffer size set).

我可以用一个 Spout 本身做同样的事情,但我想在多个 Spout 之间共享负载。我无法获得分散负载的逻辑。因为直到特定的 spout 完成消耗部分(即基于缓冲区大小设置)才会知道消息的偏移量。

Can anyone please put some bright light on the how to work-out on the logic/algorithm?

任何人都可以对如何解决逻辑/算法提出一些建议吗?

Advance Thanks for your time.

提前感谢您的时间。



Update in response to answers:针对答案进行更新:


现在在 Kafka 上使用了多分区(即55


以下是使用的代码:


builder.setSpout("spout", new KafkaSpout(cfg), 5);builder.setSpout("spout", new KafkaSpout(cfg), 5);

Tested by flooding with 800 MBdata on each partition and it took ~22 secto finish read.

通过800 MB在每个分区上充斥数据进行测试,并~22 sec完成读取。

Again, used the code with parallelism_hint = 1
i.e. builder.setSpout("spout", new KafkaSpout(cfg), 1);

再次,使用 parallelism_hint = 1 的代码,
builder.setSpout("spout", new KafkaSpout(cfg), 1);

Now it took more ~23 sec! Why?

现在需要更多~23 sec!为什么?

According to Storm DocssetSpout() declaration is as follows:

根据 Storm DocssetSpout() 声明如下:

public SpoutDeclarer setSpout(java.lang.String id,
                              IRichSpout spout,
                              java.lang.Number parallelism_hint)

where,
parallelism_hint- is the number of tasks that should be assigned to execute this spout. Each task will run on a thread in a process somewhere around the cluster.

其中,
parallelism_hint- 是应该分配来执行此 spout 的任务数。每个任务都将在集群周围某个进程中的一个线程上运行。

采纳答案by mithunsatheesh

I had come across a discussion in storm-userwhich discuss something similar.

我在storm-user 中遇到过一个讨论,讨论了类似的东西。

Read Relationship between Spout parallelism and number of kafka partitions.

阅读Spout 并行度与 kafka 分区数量之间的关系



2 things to note while using kafka-spout for storm

使用 kafka-spout 进行 Storm 时需要注意的 2 件事

  1. The maximum parallelism you can have on a KafkaSpout is the number of partitions.
  2. We can split the load into multiple kafka topics and have separate spout instances for each. ie. each spout handling a separate topic.
  1. 您可以在 KafkaSpout 上拥有的最大并行度是分区数
  2. 我们可以将负载拆分为多个 kafka 主题,并为每个. IE。每个 spout 处理一个单独的主题

So if we have a case where kafka partitions per host is configured as 1 and the number of hosts is 2. Even if we set the spout parallelism as 10, the max value which is repected will only be 2 which is the number of partitions.

因此,如果我们有一个情况,每个主机的 kafka 分区配置为 1,主机数为 2。即使我们将 spout 并行度设置为 10,预计的最大值也只有分区数 2。



How To mention the number of partition in the Kafka-spout?

如何提及Kafka-spout中的分区数?

List<HostPort> hosts = new ArrayList<HostPort>();
hosts.add(new HostPort("localhost",9092));
SpoutConfig objConfig=new SpoutConfig(new KafkaConfig.StaticHosts(hosts, 4), "spoutCaliber", "/kafkastorm", "discovery");

As you can see, here brokers can be added using hosts.addand the partion number is specified as 4in the new KafkaConfig.StaticHosts(hosts, 4)code snippet.

如您所见,此处可以使用添加代理,hosts.add并且在代码片段中将分区编号指定为4new KafkaConfig.StaticHosts(hosts, 4)



How To mention the parallelism hint in the Kafka-spout?

如何在 Kafka-spout 中提及并行性提示?

builder.setSpout("spout", spout,4);

You can mention the same while adding your spout into the topology using setSpoutmethod. Here 4is the parallelism hint.

您可以在使用setSpout方法将 spout 添加到拓扑时提及相同的内容。这里4是并行提示



More links that might help

更多可能有帮助的链接

Understanding-the-parallelism-of-a-Storm-topology

理解风暴拓扑的并行性

what-is-the-task-in-twitter-storm-parallelism

twitter-storm-parallelism 中的任务是什么



Disclaimer: !! i am new to both storm and java !!!! So pls edit/addif its required some where.

免责声明:!!我是 Storm 和 Java 的新手!!!!所以请编辑/添加,如果它需要一些地方。