Linux VFS：达到文件最大限制 1231582

Question

提问by Rick Koshi

I'm running a Linux 2.6.36 kernel, and I'm seeing some random errors. Things like

我正在运行 Linux 2.6.36 内核，并且看到一些随机错误。像

ls: error while loading shared libraries: libpthread.so.0: cannot open shared object file: Error 23

Yes, my system can't consistently run an 'ls' command. :(

是的，我的系统无法始终如一地运行“ls”命令。:(

I note several errors in my dmesg output:

我注意到我的 dmesg 输出中有几个错误：

# dmesg | tail
[2808967.543203] EXT4-fs (sda3): re-mounted. Opts: (null)
[2837776.220605] xv[14450] general protection ip:7f20c20c6ac6 sp:7fff3641b368 error:0 in libpng14.so.14.4.0[7f20c20a9000+29000]
[4931344.685302] EXT4-fs (md16): re-mounted. Opts: (null)
[4982666.631444] VFS: file-max limit 1231582 reached
[4982666.764240] VFS: file-max limit 1231582 reached
[4982767.360574] VFS: file-max limit 1231582 reached
[4982901.904628] VFS: file-max limit 1231582 reached
[4982964.930556] VFS: file-max limit 1231582 reached
[4982966.352170] VFS: file-max limit 1231582 reached
[4982966.649195] top[31095]: segfault at 14 ip 00007fd6ace42700 sp 00007fff20746530 error 6 in libproc-3.2.8.so[7fd6ace3b000+e000]

Obviously, the file-max errors look suspicious, being clustered together and recent.

显然，file-max 错误看起来很可疑，它们聚集在一起并且是最近发生的。

# cat /proc/sys/fs/file-max
1231582
# cat /proc/sys/fs/file-nr
1231712 0       1231582

That also looks a bit odd to me, but the thing is, there's no way I have 1.2 million files open on this system. I'm the only one using it, and it's not visible to anyone outside the local network.

这对我来说也有点奇怪，但问题是，我不可能在这个系统上打开 120 万个文件。我是唯一一个使用它的人，本地网络之外的任何人都看不到它。

# lsof | wc
  16046  148253 1882901
# ps -ef | wc 
    574    6104   44260

I saw some documentation saying:

我看到一些文档说：

file-max & file-nr:
The kernel allocates file handles dynamically, but as yet it doesn't free them again.
The value in file-max denotes the maximum number of file- handles that the Linux kernel will allocate. When you get lots of error messages about running out of file handles, you might want to increase this limit.
Historically, the three values in file-nr denoted the number of allocated file handles, the number of allocated but unused file handles, and the maximum number of file handles. Linux 2.6 always reports 0 as the number of free file handles -- this is not an error, it just means that the number of allocated file handles exactly matches the number of used file handles.
Attempts to allocate more file descriptors than file-max are reported with printk, look for "VFS: file-max limit reached".

文件最大和文件编号：
内核动态地分配文件句柄，但还没有再次释放它们。
file-max 中的值表示 Linux 内核将分配的最大文件句柄数。当您收到大量有关耗尽文件句柄的错误消息时，您可能希望增加此限制。
历史上，file-nr 中的三个值表示已分配的文件句柄数、已分配但未使用的文件句柄数和最大文件句柄数。Linux 2.6 总是报告 0 作为空闲文件句柄的数量——这不是错误，它只是意味着分配的文件句柄数量与使用的文件句柄数量完全匹配。
尝试分配比 file-max 多的文件描述符会用 printk 报告，查找“VFS：达到文件最大限制”。

My first reading of this is that the kernel basically has a built-in file descriptor leak, but I find that very hard to believe. It would imply that any system in active use needs to be rebooted every so often to free up the file descriptors. As I said, I can't believe this would be true, since it's normal to me to have Linux systems stay up for months (even years) at a time. On the other hand, I also can't believe that my nearly-idle system is holding over a million files open.

我对这个的第一次阅读是内核基本上有一个内置的文件描述符泄漏，但我发现这很难相信。这意味着任何正在使用的系统都需要经常重新启动以释放文件描述符。正如我所说，我无法相信这是真的，因为让 Linux 系统一次运行数月（甚至数年）对我来说是正常的。另一方面，我也不敢相信我几乎空闲的系统打开了超过一百万个文件。

Does anyone have any ideas, either for fixes or further diagnosis? I could, of course, just reboot the system, but I don't want this to be a recurring problem every few weeks. As a stopgap measure, I've quit Firefox, which was accounting for almost 2000 lines of lsof output (!) even though I only had one window open, and now I can run 'ls' again, but I doubt that will fix the problem for long. (edit: Oops, spoke too soon. By the time I finished typing out this question, the symptom was/is back)

有没有人有任何想法，无论是修复还是进一步诊断？当然，我可以重新启动系统，但我不希望这每隔几周就会重复出现。作为权宜之计，我已经退出了 Firefox，即使我只打开了一个窗口，它也占了近 2000 行 lsof 输出（！），现在我可以再次运行 'ls'，但我怀疑这会解决问题很久了。（编辑：哎呀，说得太早了。当我完成这个问题时，症状已经/又回来了）

Thanks in advance for any help.

在此先感谢您的帮助。

Answer 1

采纳答案by Rick Koshi

I hate to leave a question open, so a summary for anyone who finds this.

我不想留下一个问题，所以给任何发现这个问题的人总结一下。

I ended up reposting the question on serverfault instead (this article)

我最终在 serverfault 上重新发布了这个问题（这篇文章）

They weren't able to come up with anything, actually, but I did some more investigation and ultimately found that it's a genuine bug with NFSv4, specifically the server-side locking code. I had an NFS client which was running a monitoring script every 5 seconds, using rrdtool to log some data to an NFS-mounted file. Every time it ran, it locked the file for writing, and the server allocated (but erroneously never released) an open file descriptor. That script (plus another that ran less frequently) resulted in about 900 open files consumed per hour, and two months later, it hit the limit.

实际上，他们无法提出任何问题，但我进行了更多调查，最终发现这是 NFSv4 的真正错误，特别是服务器端锁定代码。我有一个 NFS 客户端，它每 5 秒运行一次监控脚本，使用 rrdtool 将一些数据记录到 NFS 挂载的文件中。每次运行时，它都会锁定文件以进行写入，并且服务器分配（但错误地从未释放）一个打开的文件描述符。该脚本（加上另一个运行频率较低的脚本）导致每小时消耗大约 900 个打开的文件，两个月后，它达到了限制。

Several solutions are possible: 1) Use NFSv3 instead. 2) Stop running the monitoring script. 3) Store the monitoring results locally instead of on NFS. 4) Wait for the patch to NFSv4 that fixes this (Bruce Fields actually sent me a patch to try, but I haven't had time)

有几种可能的解决方案：1) 改用 NFSv3。2) 停止运行监控脚本。3) 将监控结果存储在本地，而不是存储在 NFS 上。4) 等待修复此问题的 NFSv4 补丁（Bruce Fields 实际上给我发送了一个补丁来尝试，但我没有时间）

I'm sure you can think of other possible solutions.

我相信您可以想到其他可能的解决方案。

Thanks for trying.

感谢您的尝试。

Linux VFS：达到文件最大限制 1231582

提问by Rick Koshi

采纳答案by Rick Koshi

相关推荐

最近更新

标签

Linux VFS：达到文件最大限制 1231582

提问by Rick Koshi

采纳答案by Rick Koshi

相关推荐

Linux 安装 RMagick Gem

Linux 为什么指定的初始值设定项未在 g++ 中实现

Linux 如何获取环境变量的值？

ufw Linux 防火墙拒绝和拒绝之间的区别

相关推荐

最近更新

标签