PostgreSQL 错误:由于与恢复冲突而取消语句
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14592436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL ERROR: canceling statement due to conflict with recovery
提问by AnApprentice
I'm getting the following error when running a query on a PostgreSQL db in standby mode. The query that causes the error works fine for 1 month but when you query for more than 1 month an error results.
在待机模式下在 PostgreSQL 数据库上运行查询时,我收到以下错误。导致错误的查询可以正常工作 1 个月,但是当您查询超过 1 个月时会导致错误。
ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
Any suggestions on how to resolve? Thanks
有关如何解决的任何建议?谢谢
采纳答案by Tometzky
Running queries on hot-standby server is somewhat tricky — it can fail, because during querying some needed rows might be updated or deleted on primary. As a primary does not know that a query is started on secondary it thinks it can clean up (vacuum) old versions of its rows. Then secondary has to replay this cleanup, and has to forcibly cancel all queries which can use these rows.
在热备服务器上运行查询有点棘手——它可能会失败,因为在查询期间一些需要的行可能会在主服务器上更新或删除。由于主节点不知道查询是在辅助节点上启动的,因此它认为它可以清理(真空)其行的旧版本。然后secondary 必须重播这个清理,并且必须强制取消所有可以使用这些行的查询。
Longer queries will be canceled more often.
更长的查询将被更频繁地取消。
You can work around this by starting a repeatable read transaction on primary which does a dummy query and then sits idle while a real query is run on secondary. Its presence will prevent vacuuming of old row versions on primary.
您可以通过在主服务器上启动可重复读取事务来解决此问题,该事务执行虚拟查询,然后在辅助服务器上运行实际查询时处于空闲状态。它的存在将防止对主要版本的旧行版本进行清理。
More on this subject and other workarounds are explained in Hot Standby — Handling Query Conflictssection in documentation.
有关此主题和其他解决方法的更多信息,请参阅文档中的Hot Standby — 处理查询冲突部分。
回答by Max Malysh
No need to touch hot_standby_feedback
. As others have mentioned, setting it to on
can bloat master. Imagine opening transaction on a slave and not closing it.
无需触摸hot_standby_feedback
。正如其他人所提到的,将其设置为on
会使 master 膨胀。想象一下在从属设备上打开事务而不关闭它。
Instead, set max_standby_archive_delay
and max_standby_streaming_delay
to some sane value:
相反,将max_standby_archive_delay
and设置max_standby_streaming_delay
为一些合理的值:
# /etc/postgresql/10/main/postgresql.conf on a slave
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s
This way queries on slaves with a duration less than 900 seconds won't be cancelled. If your workload requires longer queries, just set these options to a higher value.
这样就不会取消对持续时间少于 900 秒的从站的查询。如果您的工作负载需要更长的查询,只需将这些选项设置为更高的值。
回答by eradman
There's no need to start idle transactions on the master. In postgresql-9.1 the most direct way to solve this problem is by setting
无需在 master 上启动空闲事务。在 postgresql-9.1 中解决这个问题最直接的方法是通过设置
hot_standby_feedback = on
This will make the master aware of long-running queries. From the docs:
这将使 master 知道长时间运行的查询。从文档:
The first option is to set the parameter hot_standby_feedback, which prevents VACUUM from removing recently-dead rows and so cleanup conflicts do not occur.
第一个选项是设置参数 hot_standby_feedback,它可以防止 VACUUM 删除最近死的行,因此不会发生清理冲突。
Why isn't this the default? This parameter was added after the initial implementation and it's the only way that a standby can affect a master.
为什么这不是默认值?此参数是在初始实现之后添加的,它是备用数据库影响主数据库的唯一方式。
回答by Gilles Quenot
As stated hereabout hot_standby_feedback = on
:
如前所述这里约hot_standby_feedback = on
:
Well, the disadvantage of it is that the standby can bloat the master, which might be surprising to some people, too
嗯,它的缺点是备用可以使主服务器膨胀,这对某些人来说可能也很奇怪
And here:
而在这里:
With what setting of max_standby_streaming_delay? I would rather default that to -1 than default hot_standby_feedback on. That way what you do on the standby only affects the standby
max_standby_streaming_delay 的设置是什么?我宁愿将其默认为 -1 而不是默认的 hot_standby_feedback。这样你在备用机上所做的只会影响备用机
So I added
所以我加了
max_standby_streaming_delay = -1
And no more pg_dump
error for us, nor master bloat :)
pg_dump
对我们来说没有更多的错误,也没有主膨胀:)
For AWS RDS instance, check http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html
对于 AWS RDS 实例,请查看http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html
回答by David Jaspers
The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication and resume after the query:
热备从服务器上的表数据在长时间运行的查询中被修改。确保表数据不被修改的解决方案(PostgreSQL 9.1+)是暂停复制并在查询后恢复:
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume
回答by Tushar.k
It might be too late for the answer but we face the same kind of issue on the production. Earlier we have only one RDS and as the number of users increases on the app side, we decided to add Read Replica for it. Read replica works properly on the staging but once we moved to the production we start getting the same error.
答案可能为时已晚,但我们在生产中面临同样的问题。之前我们只有一个RDS,随着应用端用户数量的增加,我们决定为其添加只读副本。只读副本在暂存时正常工作,但是一旦我们转移到生产中,我们就会开始遇到相同的错误。
So we solve this by enabling hot_standby_feedbackproperty in the Postgres properties. We referred the following link
所以我们通过在 Postgres 属性中启用hot_standby_feedback属性来解决这个问题。我们参考了以下链接
https://aws.amazon.com/blogs/database/best-practices-for-amazon-rds-postgresql-replication/
https://aws.amazon.com/blogs/database/best-practices-for-amazon-rds-postgresql-replication/
I hope it will help.
我希望它会有所帮助。
回答by Artif3x
I'm going to add some updated info and references to @max-malysh's excellent answer above.
我将添加一些更新的信息和对@max-malysh 上面的优秀答案的引用。
In short, if you do something on the master, it needs to be replicated on the slave. Postgres uses WAL records for this, which are sent after every logged action on the master to the slave. The slave then executes the action and the two are again in sync. In one of several scenarios, you can be in conflict on the slave with what's coming in from the master in a WAL action. In most of them, there's a transaction happening on the slave which conflicts with what the WAL action wants to change. In that case, you have two options:
总之,如果你在master上做了什么,就需要在slave上进行复制。Postgres 为此使用 WAL 记录,这些记录在 master 上的每个记录的操作之后发送到 slave。然后从站执行操作,两者再次同步。在几种情况之一中,您可能会在从站上与 WAL 操作中从主站进来的内容发生冲突。在大多数情况下,从属设备上发生的事务与 WAL 操作想要更改的内容相冲突。在这种情况下,您有两个选择:
- Delay the application of the WAL action for a bit, allowing the slave to finish its conflicting transaction, then apply the action.
- Cancel the conflicting query on the slave.
- 将 WAL 操作的应用延迟一点,允许从设备完成其冲突事务,然后应用该操作。
- 取消对从站的冲突查询。
We're concerned with #1, and two values:
我们关心 #1 和两个值:
max_standby_archive_delay
- this is the delay used after a long disconnection between the master and slave, when the data is being read from a WAL archive, which is not current data.max_standby_streaming_delay
- delay used for cancelling queries when WAL entries are received via streaming replication.
max_standby_archive_delay
- 这是主从之间长时间断开连接后使用的延迟,当从 WAL 存档读取数据时,该数据不是当前数据。max_standby_streaming_delay
- 当通过流复制接收到 WAL 条目时,用于取消查询的延迟。
Generally, if your server is meant for high availability replication, you want to keep these numbers short. The default setting of 30000
(milliseconds if no units given) is sufficient for this. If, however, you want to set up something like an archive, reporting- or read-replica that might have very long-running queries, then you'll want to set this to something higher to avoid cancelled queries. The recommended 900s
setting above seems like a good starting point. I disagree with the official docs on setting an infinite value -1
as being a good idea--that could mask some buggy code and cause lots of issues.
通常,如果您的服务器用于高可用性复制,您希望保持这些数字简短。默认设置30000
(如果没有给出单位,则为毫秒)就足够了。但是,如果您想设置诸如存档、报告或只读副本之类的可能具有很长时间运行的查询的内容,那么您需要将其设置为更高的值以避免取消查询。900s
上面推荐的设置似乎是一个很好的起点。我不同意将无限值设置-1
为一个好主意的官方文档——这可能会掩盖一些有缺陷的代码并导致很多问题。
The one caveat about long-running queries and setting these values higher is that other queries running on the slave in parallel with the long-running one which is causing the WAL action to be delayed will see old data until the long query has completed. Developers will need to understand this and serialize queries which shouldn't run simultaneously.
关于长时间运行的查询和将这些值设置得更高的一个警告是,与导致 WAL 操作延迟的长时间运行的查询并行运行在从属设备上的其他查询将看到旧数据,直到长查询完成。开发人员需要了解这一点并序列化不应同时运行的查询。
For the full explanation of how max_standby_archive_delay
and max_standby_streaming_delay
work and why, go here.
有关如何充分的解释max_standby_archive_delay
和max_standby_streaming_delay
工作,所以,去这里。
回答by bob
Likewise, here's a 2nd caveat to @Artif3x elaboration of @max-malysh's excellent answer, both above.
同样,这里是@Artif3x 对@max-malysh 出色答案的详细阐述的第二个警告,两者都在上面。
With any delayed application of transactions from the master the follower(s) will have an older, stale view of the data. Therefore while providing time for the query on the follower to finish by setting max_standby_archive_delay and max_standby_streaming_delay makes sense, keep both of these caveats in mind:
随着来自主服务器的任何延迟应用事务,跟随者将拥有更旧的、陈旧的数据视图。因此,虽然通过设置 max_standby_archive_delay 和 max_standby_streaming_delay 为跟随者完成查询提供时间是有道理的,但请记住这两个注意事项:
- the value of the follower as a standby / backup diminishes
- any other queries running on the follower may return stale data.
- 跟随者作为备用/备份的价值减少
- 在跟随者上运行的任何其他查询都可能返回陈旧数据。
If the value of the follower for backup ends up being too much in conflict with hosting queries, one solution would be multiple followers, each optimized for one or the other.
如果备份的关注者的价值最终与托管查询发生太多冲突,一种解决方案将是多个关注者,每个关注者针对一个或另一个进行优化。
Also, note that several queries in a row can cause the application of wal entries to keep being delayed. So when choosing the new values, it's not just the time for a single query, but a moving window that starts whenever a conflicting query starts, and ends when the wal entry is finally applied.
另请注意,连续多次查询可能会导致 wal 条目的应用程序不断延迟。因此,在选择新值时,不仅仅是单个查询的时间,而是一个移动窗口,该窗口在冲突查询开始时开始,并在最终应用 wal 条目时结束。