postgresql 尽管有足够的可用内存,但 Postgres 仍会出现内存不足错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29485644/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Postgres gets out of memory errors despite having plenty of free memory
提问by Alex Ghiculescu
I have a server running Postgres 9.1.15. The server has 2GB of RAM and no swap. Intermittently Postgres will start getting "out of memory" errors on some SELECTs, and will continue doing so until I restart Postgres orsome of the clients that are connected to it. What's weird is that when this happens, free
still reports over 500MB of free memory.
我有一台运行 Postgres 9.1.15 的服务器。服务器有 2GB 的 RAM,没有交换。Postgres 会间歇性地开始在某些 SELECT 上出现“内存不足”错误,并且会继续这样做,直到我重新启动 Postgres或连接到它的一些客户端。奇怪的是,当发生这种情况时,free
仍然报告超过 500MB 的可用内存。
select version();
:
select version();
:
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
uname -a
:
uname -a
:
Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Postgresql.conf (everything else is commented out/default):
Postgresql.conf(其他所有内容都被注释掉/默认):
max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0
I got these values from pgtune(I chose "mixed type of applications") and have been fiddling with them based on what I've read, without making much real progress. At the moment there's 68 connections, which is a typical number (I'm not using pgbouncer or any other connection poolers yet).
我从pgtune获得了这些值(我选择了“混合类型的应用程序”),并根据我阅读的内容对它们进行了摆弄,但没有取得太大的实际进展。目前有 68 个连接,这是一个典型的数字(我还没有使用 pgbouncer 或任何其他连接池)。
/etc/sysctl.conf
:
/etc/sysctl.conf
:
kernel.shmmax=1050451968
kernel.shmall=256458
vm.overcommit_ratio=100
vm.overcommit_memory=2
I first changed overcommit_memory
to 2 about a fortnight ago after the OOM killer killed the Postgres server. Prior to that the server had been running fine for a long time. The errors I get now are less catastrophic but much more annoying because they are much more frequent.
overcommit_memory
大约两周前,在 OOM 杀手杀死 Postgres 服务器后,我第一次更改为 2。在此之前,服务器已经运行了很长时间。我现在得到的错误不是灾难性的,但更烦人,因为它们更频繁。
I haven't had much luck pinpointing the first event that causes postgres to run "out of memory" - it seems to be different each time. The most recent time it crashed, the first three lines logged were:
我没有太多运气指出导致 postgres 运行“内存不足”的第一个事件 - 似乎每次都不同。最近一次崩溃时,记录的前三行是:
2015-04-07 05:32:39 UTC ERROR: out of memory
2015-04-07 05:32:39 UTC DETAIL: Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT: automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:58 UTC ERROR: out of memory
2015-04-07 05:33:58 UTC DETAIL: Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:59 UTC ERROR: out of memory
2015-04-07 05:33:59 UTC DETAIL: Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT: SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used
The crash before that, a few hours earlier, just had three instances of that last query as the first three lines of the crash. That query gets run veryoften, so I'm not sure if the issues are becauseof this query, or if it just comes up in the error log because it's a reasonably complex SELECT getting run all the time. That said, here's an EXPLAIN ANALYZE of it: http://explain.depesz.com/s/r00
在此之前的崩溃,几个小时前,只有最后一个查询的三个实例作为崩溃的前三行。该查询获取运行非常频繁,所以我不知道,如果问题是因为此查询,或者如果它只是在错误日志中出现,因为它是一个相当复杂的SELECT越来越运行所有的时间。也就是说,这是它的解释分析:http: //explain.depesz.com/s/r00
This is what ulimit -a
for the postgres user looks like:
这是ulimit -a
postgres 用户的样子:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15956
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15956
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I'll try and get the exact numbers from free
next time there's a crash, in the meantime this is a braindump of all the info I have.
我会尝试从free
下次发生崩溃时获取确切数字,与此同时,这是我所拥有的所有信息的头脑风暴。
Any ideas on where to go from here?
关于从这里去哪里的任何想法?
采纳答案by Christian
Can you check if there's any swap memory available when the error raises up?
当错误出现时,你能检查是否有任何可用的交换内存吗?
I've remove completely the swap memory in my Linux desktop (just for testing other things...) and I got the exactly same error! I'm pretty sure that this is what is going on with you too.
我已经完全删除了我的 Linux 桌面中的交换内存(只是为了测试其他东西......)并且我得到了完全相同的错误!我很确定这也是你的情况。
回答by Aaron C. de Bruyn
I just ran into this same issue with a ~2.5 GB plain-text SQL file I was trying to restore. I scaled my Digital Ocean server up to 64 GB RAM, created a 10 GB swap file, and tried again. I got an out-of-memory error with 50 GB free, and no swap in use.
我刚刚在尝试恢复的 ~2.5 GB 纯文本 SQL 文件中遇到了同样的问题。我将我的 Digital Ocean 服务器扩展到 64 GB RAM,创建了一个 10 GB 交换文件,然后再试一次。我遇到内存不足错误,有 50 GB 可用空间,并且没有使用交换。
I scaled back my server to the small 1 GB instance I was using (requiring a reboot) and figured I'd give it another shot for no other reason than I was frustrated. I started the import and realized I forgot to create my temporary swap file again.
我将我的服务器缩减到我正在使用的 1 GB 小实例(需要重新启动),并认为我会再试一次,因为我感到沮丧。我开始导入并意识到我忘记再次创建我的临时交换文件。
I created it in the middle of the import. psql made it a lotfurther before crashing. It made it through 5 additional tables.
我在导入过程中创建了它。PSQL使它成为很多进一步崩溃之前。它通过了 5 个额外的表。
I think there must be a bug allocating memory in psql.
我认为在 psql 中分配内存一定存在错误。
回答by mnencia
It is a bit suspicious that you report the same free memory size as your shared_buffers size. Are you sure you are looking the right values?
您报告的可用内存大小与 shared_buffers 大小相同,这有点令人怀疑。您确定您正在寻找正确的值吗?
Output of free
command at the time of crash would be useful as well as the content of /proc/meminfo
free
崩溃时的命令输出以及 /proc/meminfo 的内容都很有用
Beware that setting overcommit_memory
to 2 is not so effective if you see the overcommit_ratio
to 100. It will basically limits the memory allocation to the size swap (0 in this case) + 100% of physical RAM, which doesn't take into account any space for shared memory and disk caches.
请注意,overcommit_memory
如果您看到overcommit_ratio
to 100,则设置为 2 并不是那么有效。它基本上会将内存分配限制为大小交换(在这种情况下为 0)+ 100% 的物理 RAM,这不考虑任何空间共享内存和磁盘缓存。
You should probably set overcommit_ratio
to 50.
您可能应该设置overcommit_ratio
为 50。