Java 在kafka中创建多少个生产者?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21376715/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:22:33  来源:igfitidea点击:

How many producers to create in kafka?

javaapache-kafka

提问by forhas

In a high volume real time java web app I'm sending messages to apache kafka. Currently I'm sending to a single topic, but in the future I might need to send messages to multiple topics.

在高容量实时 java web 应用程序中,我正在向apache kafka发送消息。目前我正在向单个主题发送消息,但将来我可能需要向多个主题发送消息。

In this case I'm not sure weather to create a producer per topic or should I use a single producer to all my topics?

在这种情况下,我不确定是否适合为每个主题创建一个生产者,还是应该对所有主题使用一个生产者?

Here is my code:

这是我的代码:

props = new Properties();
props.put("zk.connect", <zk-ip1>:<2181>,<zk-ip3>:<2181>,<zk-ip3>:<2181>);
props.put("zk.connectiontimeout.ms", "1000000");
props.put("producer.type", "async");

Producer<String, Message> producer = new kafka.javaapi.producer.Producer<String, Message>(new ProducerConfig(props));

ProducerData<String, Message> producerData1 = new ProducerData<String, Message>("someTopic1", messageTosend);
ProducerData<String, Message> producerData2 = new ProducerData<String, Message>("someTopic2", messageTosend);

producer.send(producerData1);
producer.send(producerData2);

As you can see, once the producer has been created I can use it to send data to different topics. I wonder what is the best practice? If my app sends to multiple topics (each topic gets different data) can/should I use a single producer or should I create multiple producers? When (generaly speaking) should I use more than a single producer?

如您所见,一旦创建了生产者,我就可以使用它向不同的主题发送数据。我想知道什么是最佳实践?如果我的应用发送到多个主题(每个主题获取不同的数据)我可以/应该使用单个生产者还是应该创建多个生产者?什么时候(一般来说)我应该使用多个生产者?

采纳答案by secretmike

In general, a single producer for all topics will be more network efficient.

通常,所有主题的单个生产者将提高网络效率。

If the kafka client sees more than one topic+partition on the same Kafka Node, it can send messages for both topic+partitions in a single message. Kafka optimizes for message batches so this is efficient.

如果 kafka 客户端在同一个 Kafka 节点上看到多个主题+分区,它可以在一条消息中发送两个主题+分区的消息。Kafka 针对消息批次进行了优化,因此这是高效的。

In addition, your web servers only need to maintain at-most one tcp connection to each Kafka node, instead of one connection per producer, per node.

此外,您的 Web 服务器只需要与每个 Kafka 节点保持最多一个 tcp 连接,而不是每个生产者、每个节点一个连接。

For more info on Kafka's design: https://kafka.apache.org/documentation.html#design

有关 Kafka 设计的更多信息:https: //kafka.apache.org/documentation.html#design

As you mention in comments, lock contention may become a limiting factor, YMMV.

正如您在评论中提到的,锁争用可能成为一个限制因素,YMMV。

回答by laughing_man

We have verified in practice that having only one producer is optimal per topic. However, having multiple producers is useful if you encounter the long, fat network problem, in which case we must have multiple connections to fully utilize the network.

我们在实践中已经验证,每个主题只有一个生产者是最佳的。但是,如果遇到长而胖的网络问题,拥有多个生产者很有用,在这种情况下,我们必须有多个连接才能充分利用网络。

Batching and pipelining in a single TCP connection (as is used by Kafka) by itself will not scale to large batches if you must send to a host far away unless you do TCP Tuning to have large window sizes. This is the case when you might experiment with more producers.

如果您必须发送到远处的主机,除非您进行 TCP 调整以获得大窗口大小,否则单个 TCP 连接(如 Kafka 所使用的)中的批处理和流水线本身不会扩展到大批量。当您可能会尝试更多的制作人时就是这种情况。

回答by Liju John

In 0.8.2.0 and above if you are using same kafka producer for multiple topics then the default Partitioner logic for round robin assignment will fail.

在 0.8.2.0 及更高版本中,如果您对多个主题使用相同的 kafka 生产者,那么循环分配的默认分区器逻辑将失败。

回答by Christian Vielma

From Kafka: The Definitive Guide, in the Kafka Producers Chapter, the author says:

来自Kafka: The Definitive Guide,在 Kafka Producers 章节中,作者说:

You will probably want to start with one producer and one thread. If you need better throughput, you can add more threads that use the same producer. Once this ceases to increase throughput, you can add more producers to the application to achieve even higher throughput.

您可能希望从一个生产者和一个线程开始。如果您需要更好的吞吐量,您可以添加更多使用相同生产者的线程。一旦这不再增加吞吐量,您可以向应用程序添加更多生产者以实现更高的吞吐量。

So there might actually be benefits in having multiple producers.

因此,拥有多个生产者实际上可能会带来好处。