postgresql 我如何优雅地杀死陈旧的服务器进程 postgres
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/920956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how do I gracefully kill stale server process postgres
提问by dar
Occasionally in our lab, our postgres 8.3 database will get orphaned from the pid file, and we get this message when trying to shut down the database:
有时,在我们的实验室中,我们的 postgres 8.3 数据库会从 pid 文件中孤立出来,并且在尝试关闭数据库时会收到以下消息:
Error: pid file is invalid, please manually kill the stale server process postgres
Error: pid file is invalid, please manually kill the stale server process postgres
When this happens, we immediately do a pg_dump
so we can restore the database later. But, if we just kill -9 the orphan postgres
process and then start it, the database starts only with the data from the last successful shutdown. But if you psql
to it before killing it, the data is all available, thus why the pg_dump
works.
发生这种情况时,我们会立即执行 a,pg_dump
以便稍后恢复数据库。但是,如果我们只是杀死 -9 孤立postgres
进程然后启动它,则数据库仅使用上次成功关闭的数据启动。但是,如果您psql
在杀死它之前对其进行处理,则数据都是可用的,这就是为什么pg_dump
有效。
Is there a way to gracefully shutdown the orphaned postgres process so we don't have to go through the pg_dump and restore? Or is there a way to have the database recover after killing the orphaned process?
有没有办法优雅地关闭孤立的 postgres 进程,这样我们就不必通过 pg_dump 并恢复?或者有没有办法在杀死孤立进程后恢复数据库?
回答by Milen A. Radev
According to the documentationyou could either send SIGTERM or SIGQUIT. SIGTERM is preferred. Either way never use SIGKILL (as you know from personal experience).
根据文档,您可以发送 SIGTERM 或 SIGQUIT。SIGTERM 是首选。无论哪种方式,都不要使用 SIGKILL(正如您从个人经验中了解到的那样)。
Edit:on the other hand what you experience is not normal and could indicate a mis-configuration or a bug. Please, ask for assistance on the pgsql-adminmailing list.
编辑:另一方面,您遇到的情况不正常,可能表示配置错误或错误。请在pgsql-admin邮件列表上寻求帮助。
回答by Magnus Hagander
Neveruse kill -9.
永远不要使用kill -9。
And I would strongly advice you to try to figure out exactly how this happens. Where exactly does the error message come from? It's not a PostgreSQL error message. Are you by any chance mixing different ways to start/stop the server (initscripts sometimes and pg_ctl sometimes, for example)? That could probably cause things to go out of sync.
我强烈建议您尝试弄清楚这是如何发生的。错误消息究竟来自哪里?这不是 PostgreSQL 错误消息。您是否有机会混合不同的方式来启动/停止服务器(例如,有时使用 initscripts,有时使用 pg_ctl)?这可能会导致事情不同步。
But to answer the direct question - use a regular kill (no -9) on the process to shut it down. Make sure you kill all the postgres processes if there is more than one running.
但是要回答直接的问题 - 在进程中使用常规终止(没有 -9)来关闭它。如果有多个进程在运行,请确保杀死所有 postgres 进程。
The database will always do an automatic recovery whenever it's shut down. This shuold happen with kill -9 as well - any data that is committed should be up there. This almost sounds like you have two different data directories mounted on top of each other or something like that - this has been a known issue with NFS at least before.
每当关闭时,数据库将始终进行自动恢复。这种情况也发生在 kill -9 上 - 提交的任何数据都应该在那里。这几乎听起来像是您将两个不同的数据目录安装在彼此之上或类似的东西 - 这至少在以前是 NFS 的一个已知问题。
回答by Dan Benamy
I use a script like the following run by cron every minute.
我每分钟使用一个由 cron 运行的脚本,如下所示。
#!/bin/bash
DB="YOUR_DB"
# Here's a snippet to watch how long each connection to the db has been open:
# watch -n 1 'ps -o pid,cmd,etime -C postgres | grep $DB'
# This program kills any postgres workers/connections to the specified database
# which have been running for 2 or 3 minutes. It actually kills workers which
# have an elapsed time including "02:" or "03:". That'll be anything running
# for at least 2 minutes and less than 4. It'll also cover anything that
# managed to stay around until an hour and 2 or 3 minutes, etc.
#
# Run this once a minute via cron and it should catch any connection open
# between 2 and 3 minutes. You can temporarily disable it if if you need to run
# a long connection once in a while.
#
# The check for "03:" is in case there's a little lag starting the cron job and
# the timing is really bad and it never sees a worker in the 1 minute window
# when it's got "02:".
old=$(ps -o pid,cmd,etime -C postgres | grep "$DB" | egrep '0[23]:')
if [ -n "$old" ]; then
echo "Killing:"
echo "$old"
echo "$old" | awk '{print }' | xargs -I {} kill {}
fi