java Kafka 主题与分区主题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27816043/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Kafka Topic vs Partition topic
提问by Anil
I would like to know what is the difference between simple topic & partition topic.As per my understanding to balance the load, topic has been partitioned, Each message will have offset & consumer will acknowledge to ensure previous messages have been consumed.In case no of partition & consumer mismatches the re balance done by kafka does it efficiently manages.
我想知道简单主题和分区主题有什么区别。根据我的理解,为了平衡负载,主题已经分区,每个消息都会有偏移量,消费者会确认以确保之前的消息已经被消费。如果没有分区和消费者不匹配 kafka 完成的重新平衡是否有效管理。
If multiple topics created instead partition does it affect the operational efficiency.
如果创建多个主题而不是分区是否会影响运行效率。
回答by user2720864
From the kafka documentation
来自kafka 文档
The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data
日志中的分区有多种用途。首先,它们允许日志扩展到超过适合单个服务器的大小。每个单独的分区必须适合托管它的服务器,但一个主题可能有许多分区,因此它可以处理任意数量的数据
Having multiple partitions for any given topic allows Kafka to distribute it across the Kafka cluster. As a result the request for handling data from different partitions can be divided among multiple servers in the whole cluster. Also each partition can be replicated across multiple servers to minimize the data loss. Again from the doc page
任何给定主题的多个分区允许 Kafka 在整个 Kafka 集群中分发它。因此,处理来自不同分区的数据的请求可以在整个集群中的多个服务器之间进行分配。此外,每个分区都可以跨多个服务器复制,以最大限度地减少数据丢失。再次从文档页面
The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.
日志的分区分布在 Kafka 集群中的服务器上,每个服务器处理数据和对分区共享的请求。每个分区都跨可配置数量的服务器进行复制以实现容错。
So having a topic with a single partition won't allow you to use these flexibilities. Also note in a real life environment you can have different topics to hold different categories of messages (though it is also possible to have a single topic with multiple partitions where each partitions can have specific categories of messages using the messgae key while producing).
因此,具有单个分区的主题将不允许您使用这些灵活性。另请注意,在现实生活环境中,您可以使用不同的主题来保存不同类别的消息(尽管也可能有一个带有多个分区的主题,其中每个分区可以在生成时使用 messgae 键拥有特定类别的消息)。
I don't think creating multiple topics instead of partitions will have much impact on the overall performace. But imagine you want to keep track of all the tweets made by users in your site. You can then have one topic named "User_tweet" with multiple partitons so that while producing messages Kafka can distribute the data across multiple partitions and on the consumer end you only need to have one group of consumer pulling data from the same topic. Instead keeping "User_tweet_1", "User_tweet_2", "User_tweet_3" will only make things complex for you while both producing and consuming the messages.
我不认为创建多个主题而不是分区会对整体性能产生太大影响。但想象一下,您想要跟踪用户在您网站上发布的所有推文。然后,您可以拥有一个名为“User_tweet”的具有多个分区的主题,以便在生成消息时 Kafka 可以跨多个分区分发数据,而在消费者端,您只需要让一组消费者从同一主题中提取数据。保留“User_tweet_1”、“User_tweet_2”、“User_tweet_3”只会让你在生产和消费消息时事情变得复杂。