Java 如何在将消息传递给消费者之前过滤消息?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30915302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to filter messages before passing them on to consumers?
提问by user1079877
I'm creating a lead and event management system with Kafka. The problem is we are getting many fake leads (advertisement). We also have many consumer in our system. Is there anyway to filter advertisement before going to consumers? My solution is to write everything into the first topic, then read it by a filter consumer, then write it back to the second topic or filter it. But I'm not sure if it's efficient or not. Any idea?
我正在用 Kafka 创建一个潜在客户和事件管理系统。问题是我们收到了很多虚假的潜在客户(广告)。我们的系统中也有很多消费者。有没有在去消费者之前过滤广告?我的解决方案是将所有内容写入第一个主题,然后由过滤器使用者读取,然后将其写回第二个主题或对其进行过滤。但我不确定它是否有效。任何的想法?
采纳答案by JongHyok Lee
You can use Kafka Streams (http://kafka.apache.org/documentation.html#streamsapi) with 0.10.+ version of Kafka. It's exactly for your use case i think.
您可以将 Kafka Streams ( http://kafka.apache.org/documentation.html#streamsapi) 与 0.10.+ 版本的 Kafka 一起使用。我认为这完全适合您的用例。
回答by Jeff Gong
Yes -- in fact I am mostly convinced that this is the way you're supposed to handle a problem in your context. Because Kafka is only useful for the efficient transmission of data, there is nothing it itself can do in terms of cleaning your data. Consume all the information you get by an intermediary consumer that can run its own tests to determine what passes its filter and push to a different topic / partition (based on your needs) to get the best data back.
是的——事实上,我深信这是您在上下文中处理问题的方式。因为Kafka只对数据的高效传输有用,所以它本身在清理你的数据方面无能为力。消费您从中间消费者获得的所有信息,这些消费者可以运行自己的测试来确定什么通过了过滤器并推送到不同的主题/分区(根据您的需要)以获取最佳数据。
回答by Nikita Shamgunov
You can use Spark Streaming: https://spark.apache.org/docs/latest/streaming-kafka-integration.html.
您可以使用 Spark Streaming:https: //spark.apache.org/docs/latest/streaming-kafka-integration.html。
回答by mancini0
Take a look at Confluent's KSQL. (It's free and open source, https://www.confluent.io/product/ksql/.) It uses Kafka Streams under the hood, you can define your ksql queries and tables on the server side, the results of which are written to kafka topics, so you could just consume those topics, instead of writing code to create a intermediary filtering consumer. You'd only need to write the ksql table "ddl" or queries.
看看 Confluent 的 KSQL。(它是免费和开源的,https://www.confluent.io/product/ksql/。)它在后台使用 Kafka Streams,您可以在服务器端定义您的 ksql 查询和表,其结果被写入kafka 主题,所以你可以只使用这些主题,而不是编写代码来创建一个中间过滤消费者。您只需要编写 ksql 表“ddl”或查询。