Linux 太多打开的文件错误但 lsof 显示了合法数量的打开文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9011772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 04:13:36  来源:igfitidea点击:

Too many open files error but lsof shows a legal number of open files

javalinux

提问by hughw

My Java program is failing with

我的 Java 程序失败了

Caused by: java.io.IOException: Too many open files
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:883)...

Here are key lines from /etc/security/limits.conf. They set the max files for a user at 500k:

以下是来自/etc/security/limits.conf. 他们将用户的最大文件数设置为 500k:

root                     soft    nofile          500000
root                     hard    nofile          500000
*                        soft    nofile          500000
*                        hard    nofile          500000

I ran lsofto to count the number of files open -- both globally and by the jvm process. I examined counters in /proc/sys/fs. All seems OK. My process only has 4301 files open and the limit is 500k:

我跑去lsof计算打开的文件数量——包括全局和 jvm 进程。我检查了/proc/sys/fs. 一切似乎都很好。我的进程只打开了 4301 个文件,限制为 500k:

:~# lsof | wc -l
5526
:~# lsof -uusername | wc -l
4301
:~# cat /proc/sys/fs/file-max
744363
:~# cat /proc/sys/fs/file-max
744363
:~# cat /proc/sys/fs/file-nr
4736    0       744363

This is an Ubuntu 11.04 server. I have even rebooted so I am positive these parameters are being used.

这是一个 Ubuntu 11.04 服务器。我什至重新启动,所以我肯定正在使用这些参数。

I don't know if it's relevant, but the process is started by an upstart script, which starts the process using setuidgid, like this:

我不知道它是否相关,但该过程是由一个 upstart 脚本启动的,该脚本使用 setuidgid 启动该过程,如下所示:

exec setuidgid username java $JAVA_OPTS -jar myprogram.jar

What I am missing?

我缺少什么?

采纳答案by hughw

It turns out the problem was that my program was running as an upstart init script, and that the execstanza does not invoke a shell. ulimitand the settings in limits.conf apply only to user processes in a shell.

事实证明,问题在于我的程序作为一个新贵的初始化脚本运行,并且该exec节没有调用 shell。ulimit并且limits.conf 中的设置仅适用于shell 中的用户进程。

I verified this by changing the exec stanza to

我通过将 exec 节更改为

exec sudo -u username java $JAVA_OPTS -jar program.jar

which runs java in username's default shell. That allowed the program to use as many open files as it needs.

它在用户名的默认 shell 中运行 java。这允许程序根据需要使用尽可能多的打开文件。

I have seen it mentionedthat you can also call ulimit -nprior to invoking the command; for an upstart script I think you would use a scriptstanza instead.

看到它提到你也可以ulimit -n在调用命令之前调用;对于新贵脚本,我认为您会改用script节。

I found a better diagnostic than lsofto be ls /proc/{pid}/fd | wc -l, to obtain a precise count of the open file descriptor. By monitoring that I could see that the failures occurred right at 4096 open fds. I don't know where that 4096 comes from; it's not in /etc anywhere; I guess it's compiled into the kernel.

我找到了比lsof,更好的诊断方法ls /proc/{pid}/fd | wc -l,以获得打开文件描述符的精确计数。通过监控,我可以看到故障发生在 4096 个打开的 fds 处。我不知道 4096 是从哪里来的;它不在 /etc 任何地方;我猜它已编译到内核中。

回答by phs

I have this snippet of bash at the top of a server creation script:

我在服务器创建脚本的顶部有这个 bash 片段:

# Hyman up the max number of open file descriptors at the kernel
echo "fs.file-max = 1000000" >> /etc/sysctl.conf
invoke-rc.d procps start

# Increase max open file descriptors for this process
ulimit -n 1000000

# And for future ones as well
cat >> /etc/profile <<LIMITS
ulimit -n 1000000
LIMITS
cat >> /etc/security/limits.conf <<LIMITS
root - nofile 1000000
LIMITS