了解 Linux oom-killer 的日志

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9199731/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 04:28:44  来源:igfitidea点击:

Understanding the Linux oom-killer's logs

linuxmemoryout-of-memory

提问by Corey Velan

My app was killed by the oom-killer. It is Ubuntu 11.10 running on a live USB with no swap and the PC has 1 Gig of RAM. The only app running (other than all the built in Ubuntu stuff) is my program flasherav. Note that /tmp is memory mapped and at the time of the crash had about 200MB of files in it (so was taking up ~200MB of RAM).

我的应用程序被 oom-killer 杀死了。它是在实时 USB 上运行的 Ubuntu 11.10,没有交换,并且 PC 有 1 Gig 的 RAM。唯一运行的应用程序(除了所有内置的 Ubuntu 东西)是我的程序 flasherav。请注意,/tmp 是内存映射的,并且在崩溃时其中有大约 200MB 的文件(因此占用了大约 200MB 的 RAM)。

I'm trying to understand how to analyze the om-killer log such that I can understand where exactly all the memory is being used- i.e. what are the different chunks that will add up to ~1 gig which resulted in the oom-killer kicking in? Once I understand that, I can work on reducing the offender's usage so the app will run on a machine with 1 GB of ram. My specific questions are.

我试图了解如何分析 om-killer 日志,以便我可以了解所有内存的确切使用位置 - 即哪些不同的块会加起来约 1 gig 导致 oom-killer 踢在?一旦我理解了这一点,我就可以努力减少攻击者的使用量,以便应用程序可以在具有 1 GB 内存的机器上运行。我的具体问题是。

To try to analyze the situation, I summed up the "total_vm" column and I only get 609342KB (which when added to the 200MB in /tmp is still only 809MB). Maybe I'm wrong on what the "total_vm" column is- does it include allocated but not used memory plus shared memory. If yes, then shouldn't it far overstate actually used memory (and therefore I shouldn't be out of memory), right? Are there other chunks of memory in use that aren't accounted for in the list below?

为了尝试分析情况,我总结了“total_vm”列,我只得到了 609342KB(添加到 /tmp 中的 200MB 仍然只有 809MB)。也许我对“total_vm”列的理解有误 - 它是否包括已分配但未使用的内存和共享内存。如果是,那么它不应该夸大实际使用的内存(因此我不应该内存不足),对吗?是否还有其他正在使用的内存块未在下面的列表中说明?

[11686.040460] flasherav invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[11686.040467] flasherav cpuset=/ mems_allowed=0
[11686.040472] Pid: 2859, comm: flasherav Not tainted 3.0.0-12-generic #20-Ubuntu
[11686.040476] Call Trace:
[11686.040488]  [<c10e1c15>] dump_header.isra.7+0x85/0xc0
[11686.040493]  [<c10e1e6c>] oom_kill_process+0x5c/0x80
[11686.040498]  [<c10e225f>] out_of_memory+0xbf/0x1d0
[11686.040503]  [<c10e6123>] __alloc_pages_nodemask+0x6c3/0x6e0
[11686.040509]  [<c10e78d3>] ? __do_page_cache_readahead+0xe3/0x170
[11686.040514]  [<c10e0fc8>] filemap_fault+0x218/0x390
[11686.040519]  [<c1001c24>] ? __switch_to+0x94/0x1a0
[11686.040525]  [<c10fb5ee>] __do_fault+0x3e/0x4b0
[11686.040530]  [<c1069971>] ? enqueue_hrtimer+0x21/0x80
[11686.040535]  [<c10fec2c>] handle_pte_fault+0xec/0x220
[11686.040540]  [<c10fee68>] handle_mm_fault+0x108/0x210
[11686.040546]  [<c152fa00>] ? vmalloc_fault+0xee/0xee
[11686.040551]  [<c152fb5b>] do_page_fault+0x15b/0x4a0
[11686.040555]  [<c1069a90>] ? update_rmtp+0x80/0x80
[11686.040560]  [<c106a7b6>] ? hrtimer_start_range_ns+0x26/0x30
[11686.040565]  [<c106aeaf>] ? sys_nanosleep+0x4f/0x60
[11686.040569]  [<c152fa00>] ? vmalloc_fault+0xee/0xee
[11686.040574]  [<c152cfcf>] error_code+0x67/0x6c
[11686.040580]  [<c1520000>] ? reserve_backup_gdb.isra.11+0x26d/0x2c0
[11686.040583] Mem-Info:
[11686.040585] DMA per-cpu:
[11686.040588] CPU    0: hi:    0, btch:   1 usd:   0
[11686.040592] CPU    1: hi:    0, btch:   1 usd:   0
[11686.040594] Normal per-cpu:
[11686.040597] CPU    0: hi:  186, btch:  31 usd:   5
[11686.040600] CPU    1: hi:  186, btch:  31 usd:  30
[11686.040603] HighMem per-cpu:
[11686.040605] CPU    0: hi:   42, btch:   7 usd:   7
[11686.040608] CPU    1: hi:   42, btch:   7 usd:  22
[11686.040613] active_anon:113150 inactive_anon:113378 isolated_anon:0
[11686.040615]  active_file:86 inactive_file:1964 isolated_file:0
[11686.040616]  unevictable:0 dirty:0 writeback:0 unstable:0
[11686.040618]  free:13274 slab_reclaimable:2239 slab_unreclaimable:2594
[11686.040619]  mapped:1387 shmem:4380 pagetables:1375 bounce:0
[11686.040627] DMA free:4776kB min:784kB low:980kB high:1176kB active_anon:5116kB inactive_anon:5472kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:80kB slab_unreclaimable:168kB kernel_stack:96kB pagetables:64kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
[11686.040634] lowmem_reserve[]: 0 865 1000 1000
[11686.040644] Normal free:48212kB min:44012kB low:55012kB high:66016kB active_anon:383196kB inactive_anon:383704kB active_file:344kB inactive_file:7884kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:5548kB shmem:17520kB slab_reclaimable:8876kB slab_unreclaimable:10208kB kernel_stack:1960kB pagetables:3976kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:930 all_unreclaimable? yes
[11686.040652] lowmem_reserve[]: 0 0 1078 1078
[11686.040662] HighMem free:108kB min:132kB low:1844kB high:3560kB active_anon:64288kB inactive_anon:64336kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:138072kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1460kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:61 all_unreclaimable? yes
[11686.040669] lowmem_reserve[]: 0 0 0 0
[11686.040675] DMA: 20*4kB 24*8kB 34*16kB 26*32kB 19*64kB 13*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4784kB
[11686.040690] Normal: 819*4kB 607*8kB 357*16kB 176*32kB 99*64kB 49*128kB 23*256kB 4*512kB 0*1024kB 0*2048kB 2*4096kB = 48212kB
[11686.040704] HighMem: 16*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80kB
[11686.040718] 14680 total pagecache pages
[11686.040721] 8202 pages in swap cache
[11686.040724] Swap cache stats: add 2191074, delete 2182872, find 1247325/1327415
[11686.040727] Free swap  = 0kB
[11686.040729] Total swap = 524284kB
[11686.043240] 262100 pages RAM
[11686.043244] 34790 pages HighMem
[11686.043246] 5610 pages reserved
[11686.043248] 2335 pages shared
[11686.043250] 240875 pages non-shared
[11686.043253] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[11686.043266] [ 1084]     0  1084      662        1   0       0             0 upstart-udev-br
[11686.043271] [ 1094]     0  1094      743       79   0     -17         -1000 udevd
[11686.043276] [ 1104]   101  1104     7232       42   0       0             0 rsyslogd
[11686.043281] [ 1149]   103  1149     1066      188   1       0             0 dbus-daemon
[11686.043286] [ 1165]     0  1165     1716       66   0       0             0 modem-manager
[11686.043291] [ 1220]   106  1220      861       42   0       0             0 avahi-daemon
[11686.043296] [ 1221]   106  1221      829        0   1       0             0 avahi-daemon
[11686.043301] [ 1255]     0  1255     6880      117   0       0             0 NetworkManager
[11686.043306] [ 1308]     0  1308     5988      144   0       0             0 polkitd
[11686.043311] [ 1334]     0  1334      723       85   0     -17         -1000 udevd
[11686.043316] [ 1335]     0  1335      730      108   0     -17         -1000 udevd
[11686.043320] [ 1375]     0  1375      663       37   0       0             0 upstart-socket-
[11686.043325] [ 1464]     0  1464     1333      120   1       0             0 login
[11686.043330] [ 1467]     0  1467     1333      135   1       0             0 login
[11686.043335] [ 1486]     0  1486     1333      135   1       0             0 login
[11686.043339] [ 1487]     0  1487     1333      136   1       0             0 login
[11686.043344] [ 1493]     0  1493     1333      134   1       0             0 login
[11686.043349] [ 1528]     0  1528      496       45   0       0             0 acpid
[11686.043354] [ 1529]     0  1529      607       46   1       0             0 cron
[11686.043359] [ 1549]     0  1549    10660      100   0       0             0 lightdm
[11686.043363] [ 1550]     0  1550      570       28   0       0             0 atd
[11686.043368] [ 1584]     0  1584      855       35   0       0             0 irqbalance
[11686.043373] [ 1703]     0  1703    17939     9653   0       0             0 Xorg
[11686.043378] [ 1874]     0  1874     7013      174   0       0             0 console-kit-dae
[11686.043382] [ 1958]     0  1958     1124       52   1       0             0 bluetoothd
[11686.043388] [ 2048]   999  2048     2435      641   1       0             0 bash
[11686.043392] [ 2049]   999  2049     2435      595   0       0             0 bash
[11686.043397] [ 2050]   999  2050     2435      587   1       0             0 bash
[11686.043402] [ 2051]   999  2051     2435      634   1       0             0 bash
[11686.043406] [ 2054]   999  2054     2435      569   0       0             0 bash
[11686.043411] [ 2155]     0  2155     1333      128   0       0             0 login
[11686.043416] [ 2222]     0  2222      684       67   1       0             0 dhclient
[11686.043420] [ 2240]   999  2240     2435      415   0       0             0 bash
[11686.043425] [ 2244]     0  2244     3631       58   0       0             0 accounts-daemon
[11686.043430] [ 2258]   999  2258    11683      277   0       0             0 gnome-session
[11686.043435] [ 2407]   999  2407      964       24   0       0             0 ssh-agent
[11686.043440] [ 2410]   999  2410      937       53   0       0             0 dbus-launch
[11686.043444] [ 2411]   999  2411     1319      300   1       0             0 dbus-daemon
[11686.043449] [ 2413]   999  2413     2287       88   0       0             0 gvfsd
[11686.043454] [ 2418]   999  2418     7867      123   1       0             0 gvfs-fuse-daemo
[11686.043459] [ 2427]   999  2427    32720      804   0       0             0 gnome-settings-
[11686.043463] [ 2437]   999  2437    10750      124   0       0             0 gnome-keyring-d
[11686.043468] [ 2442]   999  2442     2321      244   1       0             0 gconfd-2
[11686.043473] [ 2447]     0  2447     6490      156   0       0             0 upowerd
[11686.043478] [ 2467]   999  2467     7590       87   0       0             0 dconf-service
[11686.043482] [ 2529]   999  2529    11807      211   0       0             0 gsd-printer
[11686.043487] [ 2531]   999  2531    12162      587   0       0             0 metacity
[11686.043492] [ 2535]   999  2535    19175      960   0       0             0 unity-2d-panel
[11686.043496] [ 2536]   999  2536    19408     1012   0       0             0 unity-2d-launch
[11686.043502] [ 2539]   999  2539    16154     1120   1       0             0 nautilus
[11686.043506] [ 2540]   999  2540    17888      534   0       0             0 nm-applet
[11686.043511] [ 2541]   999  2541     7005      253   0       0             0 polkit-gnome-au
[11686.043516] [ 2544]   999  2544     8930      430   0       0             0 bamfdaemon
[11686.043521] [ 2545]   999  2545    11217      442   1       0             0 bluetooth-apple
[11686.043525] [ 2547]   999  2547      510       16   0       0             0 sh
[11686.043530] [ 2548]   999  2548    11205      301   1       0             0 gnome-fallback-
[11686.043535] [ 2565]   999  2565     6614      179   1       0             0 gvfs-gdu-volume
[11686.043539] [ 2567]     0  2567     5812      164   1       0             0 udisks-daemon
[11686.043544] [ 2571]     0  2571     1580       69   0       0             0 udisks-daemon
[11686.043549] [ 2579]   999  2579    16354     1035   0       0             0 unity-panel-ser
[11686.043554] [ 2602]     0  2602     1188       47   0       0             0 sudo
[11686.043559] [ 2603]     0  2603   374634   181503   0       0             0 flasherav
[11686.043564] [ 2607]   999  2607    12673      189   0       0             0 indicator-appli
[11686.043569] [ 2609]   999  2609    19313      311   1       0             0 indicator-datet
[11686.043573] [ 2611]   999  2611    15738      225   0       0             0 indicator-messa
[11686.043578] [ 2615]   999  2615    17433      237   1       0             0 indicator-sessi
[11686.043583] [ 2627]   999  2627     2393      132   0       0             0 gvfsd-trash
[11686.043588] [ 2640]   999  2640     1933       85   0       0             0 geoclue-master
[11686.043592] [ 2650]     0  2650     2498     1136   1       0             0 mount.ntfs
[11686.043598] [ 2657]   999  2657     6624      128   1       0             0 telepathy-indic
[11686.043602] [ 2659]   999  2659     2246      112   0       0             0 mission-control
[11686.043607] [ 2662]   999  2662     5431      346   1       0             0 gdu-notificatio
[11686.043612] [ 2664]     0  2664     3716     2392   0       0             0 mount.ntfs
[11686.043617] [ 2679]   999  2679    12453      197   1       0             0 zeitgeist-datah
[11686.043621] [ 2685]   999  2685     5196     1581   1       0             0 zeitgeist-daemo
[11686.043626] [ 2934]   999  2934    16305      710   0       0             0 gnome-terminal
[11686.043631] [ 2938]   999  2938      553        0   0       0             0 gnome-pty-helpe
[11686.043636] [ 2939]   999  2939     1814      406   0       0             0 bash
[11686.043641] Out of memory: Kill process 2603 (flasherav) score 761 or sacrifice child
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB

回答by Jite

Memory management in Linux is a bit tricky to understand, and I can't say I fully understand it yet, but I'll try to share a little bit of my experience and knowledge.

Linux 中的内存管理有点难以理解,我还不能说我完全理解它,但我会尝试分享我的一些经验和知识。

Short answer to your question: Yes there are other stuff included than whats in the list.

对您的问题的简短回答:是的,除了列表中的内容之外,还包括其他内容。

What's being shown in your list is applications run in userspace. The kernel uses memory for itself and modules, on top of that it also has a lower limit of free memory that you can't go under. When you've reached that level it will try to free up resources, and when it can't do that anymore, you end up with an OOM problem.

您的列表中显示的是在用户空间中运行的应用程序。内核为自身和模块使用内存,除此之外,它还有一个您无法使用的可用内存下限。当你达到那个级别时,它会尝试释放资源,当它不能再这样做时,你最终会遇到 OOM 问题。

From the last line of your list you can read that the kernel reports a total-vm usage of: 1498536kB (1,5GB), where the total-vm includes both your physical RAM and swap space. You stated you don't have any swap but the kernel seems to think otherwise since your swap space is reported to be full (Total swap = 524284kB, Free swap = 0kB) and it reports a total vmem size of 1,5GB.

从列表的最后一行,您可以看到内核报告的总 vm 使用量为:1498536kB (1,5GB),其中总 vm 包括您的物理 RAM 和交换空间。你说你没有任何交换,但内核似乎不这么认为,因为你的交换空间被报告为已满(总交换 = 524284kB,自由交换 = 0kB)并且它报告总 vmem 大小为 1.5GB。

Another thing that can complicate things further is memory fragmentation. You can hit the OOM killer when the kernel tries to allocate lets say 4096kB of continous memory, but there are no free ones availible.

另一件可能使事情进一步复杂化的事情是内存碎片。当内核尝试分配比方说 4096kB 的连续内存时,您可以击中 OOM 杀手,但没有可用的空闲内存。

Now that alone probably won't help you solve the actual problem. I don't know if it's normal for your program to require that amount of memory, but I would recommend to try a static code analyzer like cppcheck to check for memory leaks or file descriptor leaks. You could also try to run it through Valgrind to get a bit more information out about memory usage.

现在,仅此一项可能无法帮助您解决实际问题。我不知道您的程序需要那么多内存是否正常,但我建议您尝试使用静态代码分析器(如 cppcheck)来检查内存泄漏或文件描述符泄漏。您也可以尝试通过 Valgrind 运行它以获取有关内存使用情况的更多信息。

回答by Shahab

This webpagehave an explanation and a solution.

这个网页有一个解释和一个解决方案。

The solution is:

解决办法是:

To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:

vm.overcommit_memory = 2

vm.overcommit_ratio = 80

为了解决这个问题,内核的行为必须改变,所以它不会再为应用程序请求过度使用内存。最后,我将这些提到的值包含在 /etc/sysctl.conf 文件中,以便它们在启动时自动应用:

vm.overcommit_memory = 2

vm.overcommit_ratio = 80

回答by WKPlus

Sum of total_vmis 847170 and sum of rssis 214726, these two values are counted in 4kB pages, which means when oom-killer was running, you had used 214726*4kB=858904kB physical memory and swap space.

Sum oftotal_vm是 847170 和 sum ofrss是 214726,这两个值以 4kB 页面计算,这意味着在 oom-killer 运行时,您使用了 214726*4kB=858904kB 物理内存和交换空间。

Since your physical memory is 1GB and ~200MB was used for memory mapping, it's reasonable for invoking oom-killer when 858904kB was used.

由于您的物理内存为 1GB,大约 200MB 用于内存映射,因此在使用 858904kB 时调用 oom-killer 是合理的。

rssfor process 2603 is 181503, which means 181503*4KB=726012 rss, was equal to sum of anon-rssand file-rss.

rss过程2603是181503,这意味着181503 * 4KB = 726012个RSS,等于求和的anon-rssfile-rss

[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB

[11686.043647] 杀死进程 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB