java Kafka 如何为每个主题存储偏移量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45686885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 08:50:32  来源:igfitidea点击:

How does Kafka store offsets for each topic?

javaapache-kafkakafka-consumer-api

提问by Raunaq Kochar

While polling Kafka, I have subscribed to multiple topics using the subscribe()function. Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek()and poll()from a topic. Will calling seek()iteratively over each of the topic names, before polling for dataachieve the result? How are the offsets exactly stored in Kafka?

在轮询 Kafka 时,我使用该subscribe()功能订阅了多个主题。现在,我想设置的偏离,我想从每个主题阅读,而无需每次重新订阅后seek(),并poll()从一个话题。在轮询数据之前seek()迭代调用每个主题名称会达到结果吗?偏移量是如何准确存储在 Kafka 中的?

I have one partition per topic and just one consumer to read from all topics.

我每个主题有一个分区,只有一个消费者可以从所有主题中读取。

回答by GuangshengZuo

How does Kafka store offsets for each topic?

Kafka 如何为每个主题存储偏移量?

Kafka has moved the offset storage from zookeeper to kafka brokers. The reason is below:

Kafka 已将偏移存储从 zookeeper 转移到 kafka brokers。原因如下:

Zookeeper is not a good way to service a high-write load such as offset updates because zookeeper routes each write though every node and hence has no ability to partition or otherwise scale writes. We have always known this, but chose this implementation as a kind of "marriage of convenience" since we already depended on zk.

Zookeeper 不是为诸如偏移更新之类的高写入负载提供服务的好方法,因为 Zookeeper 将每次写入路由到每个节点,因此无法分区或以其他方式扩展写入。我们一直都知道这一点,但是因为我们已经依赖于 zk,所以选择了这种实现作为一种“便利的结合”。

Kafka store the offset commits in a topic, when consumer commit the offset, kafka publish an commit offset message to an "commit-log" topic and keep an in-memory structure that mapped group/topic/partition to the latest offset for fast retrieval. More design infomation could be found in this page about offset management.

Kafka 将偏移量提交存储在一个主题中,当消费者提交偏移量时,kafka 将提交偏移量消息发布到“提交日志”主题,并保留一个将组/主题/分区映射到最新偏移量的内存结构,以便快速检索. 有关偏移管理的更多设计信息可以在此页面中找到。

Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek() and poll() from a topic.

现在,我想设置我想从每个主题读取的偏移量,而不是在每个主题的 seek() 和 poll() 之后重新订阅。

There is a new feature about kafka admin tools to reset offset.

kafka 管理工具有一个新功能可以重置偏移量。

kafka-consumer-group.sh --bootstrap-server 127.0.0.1:9092 --group
      your-consumer-group **--reset-offsets** --to-offset 1 --all-topics --execute

There are more options you can use.

还有,你可以使用更多的选择