将 postgreSql 数据与 ElasticSearch 同步

Question

提问by Khanetor

Ultimately I want to have a scalable search solution for the data in PostgreSql. My finding points me towards using Logstash to ship write events from Postgres to ElasticSearch, however I have not found a usable solution. The soluions I have found involve using jdbc-input to query alldata from Postgres on an interval, and the delete events are not captured.

最终我想为 PostgreSql 中的数据提供一个可扩展的搜索解决方案。我的发现使我倾向于使用 Logstash 将写入事件从 Postgres 传送到 ElasticSearch，但是我还没有找到可用的解决方案。我发现的解决方案涉及使用 jdbc-input 以间隔查询来自 Postgres 的所有数据，并且不会捕获删除事件。

I think this is a common use case so I hope you guys could share with me your experience, or give me some pointers to proceed.

我认为这是一个常见的用例，所以我希望你们可以与我分享您的经验，或者给我一些指示以继续。

Answer 1

采纳答案by Val

If you need to also be notified on DELETEs and delete the respective record in Elasticsearch, it is true that the Logstash jdbc input will not help. You'd have to use a solution working around the binlog as suggested here

如果您还需要在 DELETE 上得到通知并删除 Elasticsearch 中的相应记录，那么 Logstash jdbc 输入确实无济于事。您必须按照此处的建议使用解决 binlog 的解决方案

However, if you still want to use the Logstash jdbc input, what you could do is simply soft-delete records in PostgreSQL, i.e. create a new BOOLEAN column in order to mark your records as deleted. The same flag would then exist in Elasticsearch and you can exclude them from your searches with a simple termquery on the deletedfield.

但是，如果您仍然想使用 Logstash jdbc 输入，您可以做的只是软删除 PostgreSQL 中的记录，即创建一个新的 BOOLEAN 列以将您的记录标记为deleted. 然后 Elasticsearch 中将存在相同的标志，您可以term通过对该deleted字段的简单查询将它们从搜索中排除。

Whenever you need to perform some cleanup, you can delete all records flagged deletedin both PostgreSQL and Elasticsearch.

每当您需要执行一些清理时，您可以删除deleted在 PostgreSQL 和 Elasticsearch 中标记的所有记录。

Answer 2

回答by Yegor Zaremba

Please take a look at Debezium. It's a change data capture (CDC) platform, which allow you to steam your data

请看一下Debezium。这是一个变更数据捕获 (CDC) 平台，可让您传输数据

I created a simple github repository, which shows how it works

我创建了一个简单的github 存储库，它展示了它是如何工作的

Answer 3

回答by taina

You can also take a look at PGSync.

您还可以查看PGSync。

It's similar to Debezium but a lot easier to get up and running.

它类似于 Debezium，但更容易启动和运行。

PGSync is a Change data capture tool for moving data from Postgres to Elasticsearch. It allows you to keep Postgres as your source-of-truth and expose structured denormalized documents in Elasticsearch.

PGSync 是一个变更数据捕获工具，用于将数据从 Postgres 移动到 Elasticsearch。它允许您将 Postgres 作为真实来源并在 Elasticsearch 中公开结构化的非规范化文档。

You simply define a JSON schema describing the structure of the data in Elasticsearch.

您只需定义一个 JSON 模式来描述 Elasticsearch 中的数据结构。

Here is an example schema: (you can also have nested objects)

这是一个示例架构：（您也可以有嵌套对象）

e.g

例如

{
    "nodes": [
        {
            "table": "book",
            "columns": [
                "isbn",
                "title",
                "description"
            ]
        }
    ]
}

{
    "nodes": [
        {
            "table": "book",
            "columns": [
                "isbn",
                "title",
                "description"
            ]
        }
    ]
}

PGsync generates queries for your document on the fly. No need to write queries like Logstash. It also supports and tracks deletion operations.

PGsync 会即时为您的文档生成查询。无需像 Logstash 那样编写查询。它还支持和跟踪删除操作。

It operates both a polling and an event-driven model to capture changes made to date and notification for changes that occur at a point in time. The initial sync polls the database for changes since the last time the daemon was run and thereafter event notification (based on triggers and handled by the pg-notify) for changes to the database.

它同时运行轮询和事件驱动模型来捕获迄今为止所做的更改并通知某个时间点发生的更改。初始同步轮询数据库自上次运行守护程序以来的更改，此后事件通知（基于触发器并由 pg-notify 处理）以获取数据库更改。

It has very little development overhead.

它的开发开销非常小。

Create a schema as described above
point pgsync at your Postgres database and Elasticsearch cluster
Startup the daemon.

如上所述创建架构
将 pgsync 指向您的 Postgres 数据库和 Elasticsearch 集群
启动守护进程。

You can easily create a document that includes multiple relations as nested objects. PGSync tracks any changes for you.

您可以轻松创建包含多个关系作为嵌套对象的文档。PGSync 会为您跟踪任何更改。

Have a look at the githubrepo for more details.

查看githubrepo 以获取更多详细信息。

You can pip install the package from PyPI

你可以从PyPIpip install 包

将 postgreSql 数据与 ElasticSearch 同步

提问by Khanetor

采纳答案by Val

回答by Yegor Zaremba

回答by taina

相关推荐

最近更新

标签

将 postgreSql 数据与 ElasticSearch 同步

提问by Khanetor

采纳答案by Val

回答by Yegor Zaremba

回答by taina

相关推荐

postgresql 在 postgres 上使用什么代替“INSERT ... ON CONFLICT DO NOTHING”

如何在 PostgreSQL 9.5 中为“int”数据类型设置大小限制

postgresql 字符串与 null 的连接似乎使整个字符串无效 - 这是 Postgres 中所需的行为吗？

在 Python 中通过 SSH 隧道连接到 Postgresql 数据库

相关推荐

最近更新

标签