bash 有没有检测过时 NFS 挂载的好方法

Question

提问by Chen Levy

I have a procedure I want to initiate only if several tests complete successfully.

我有一个程序，我只想在几次测试成功完成后才启动。

One test I need is that all of my NFS mounts are alive and well.

我需要的一项测试是我所有的 NFS 安装都正常运行。

Can I do better than the brute force approach:

我能比蛮力方法做得更好吗：

mount | sed -n "s/^.* on \(.*\) type nfs .*$//p" | 
while read mount_point ; do 
  timeout 10 ls $mount_point >& /dev/null || echo "stale $mount_point" ; 
done

Here timeoutis a utility that will run the command in the background, and will kill it after a given time, if no SIGCHLDwas caught prior to the time limit, returning success/fail in the obvious way.

这timeout是一个实用程序，它将在后台运行命令，并在给定时间后终止它，如果SIGCHLD在时间限制之前没有被捕获，则以明显的方式返回成功/失败。

In English: Parse the output of mount, check (bounded by a timeout) every NFS mount point. Optionally (not in the code above) breaking on the first stale mount.

英文：解析输出mount，检查（受超时限制）每个 NFS 挂载点。可选（不在上面的代码中）在第一个陈旧的安装上中断。

Answer 1

采纳答案by Teddy

You could write a C program and check for ESTALE.

您可以编写一个 C 程序并检查ESTALE.

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iso646.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>

int main(){
    struct stat st;
    int ret;
    ret = stat("/mnt/some_stale", &st);
    if(ret == -1 and errno == ESTALE){
        printf("/mnt/some_stale is stale\n");
        return EXIT_SUCCESS;
    } else {
        return EXIT_FAILURE;
    }
}

Answer 2

回答by astrostl

A colleague of mine ran into your script. This doesn't avoid a "brute force" approach, but if I may in Bash:

我的一个同事遇到了你的脚本。这并不能避免“蛮力”方法，但如果我可以在 Bash 中使用：

while read _ _ mount _; do 
  read -t1 < <(stat -t "$mount") || echo "$mount timeout"; 
done < <(mount -t nfs)

mountcan list NFS mounts directly. read -t(a shell builtin) can time out a command. stat -t(terse output) still hangs like an ls*. lsyields unnecessary output, risks false positives on huge/slow directory listings, and requires permissions to access - which would also trigger a false positive if it doesn't have them.

mount可以直接列出 NFS 挂载。 read -t（shell 内置）可以使命令超时。 stat -t（简洁的输出）仍然像ls*一样挂起。 ls产生不必要的输出，在巨大/缓慢的目录列表上有误报的风险，并且需要访问权限 - 如果没有它们也会触发误报。

while read _ _ mount _; do 
  read -t1 < <(stat -t "$mount") || lsof -b 2>/dev/null|grep "$mount"; 
done < <(mount -t nfs)

We're using it with lsof -b(non-blocking, so it won't hang too) in order to determine the source of the hangs.

我们将它与lsof -b（非阻塞，因此它也不会挂起）一起使用以确定挂起的来源。

Thanks for the pointer!

谢谢指点！

test -d(a shell builtin) would work instead of stat(a standard external) as well, but read -treturns success only if it doesn't time out and reads a line of input. Since test -ddoesn't use stdout, a (( $? > 128 ))errorlevel check on it would be necessary - not worth the legibility hit, IMO.

test -d(shell builtin) 也可以代替stat(standard external)工作，但read -t只有当它没有超时并读取一行输入时才返回成功。由于test -d不使用标准输出，(( $? > 128 ))因此需要对其进行错误级别检查 - 不值得易读性，IMO。

Answer 3

回答by Oz123

Took me some time, but here is what I found which works in Python:

我花了一些时间，但这是我发现在 Python 中有效的内容：

import signal, os, subprocess
class Alarm(Exception):
    pass

def alarm_handler(signum, frame):
    raise Alarm

pathToNFSMount = '/mnt/server1/' # or you can implement some function 
                                 # to find all the mounts...

signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(3)  # 3 seconds
try:
    proc = subprocess.call('stat '+pathToNFSMount, shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE) 
    stdoutdata, stderrdata = proc.communicate()
    signal.alarm(0)  # reset the alarm
except Alarm:
    print "Oops, taking too long!"

Remarks:

评论：

credit to the answer here.
You could also use alternative scheme:
os.fork()and os.stat()

归功于这里的答案。
您还可以使用替代方案：
os.fork()和 os.stat()

check if the fork finished, if it has timed out you can kill it. You will need to work with time.time()and so on.

检查叉子是否完成，如果它超时，你可以杀死它。您将需要time.time()与等等一起工作。

Answer 4

回答by Costa

In addition to previous answers, which hangs under some circumstances, this snippet checks all suitable mounts, kills with signal KILL, and is tested with CIFS too:

除了在某些情况下挂起的先前答案之外，此代码段还会检查所有合适的挂载，使用信号 KILL 杀死，并且也使用 CIFS 进行测试：

grep -v tracefs /proc/mounts | cut -d' ' -f2 | \
  while read m; do \
    timeout --signal=KILL 1 ls -d $m > /dev/null || echo "$m"; \
  done

Answer 5

回答by Chris Adams

I wrote https://github.com/acdha/mountstatuswhich uses an approach similar to what UndeadKernel mentioned, which I've found to be the most robust approach: it's a daemon which periodically scans all mounted filesystems by forking a child process which attempts to list the top-level directory and SIGKILLit if it fails to respond in a certain timeout, with both successes and failures recorded to syslog. That avoids issues with certain client implementations (e.g older Linux) which never trigger timeouts for certain classes of error, NFS servers which are partially responsive but e.g. won't respond to actual calls like listdir, etc.

我写了https://github.com/acdha/mountstatus，它使用了一种类似于 UndeadKernel 提到的方法，我发现这是最健壮的方法：它是一个守护进程，它通过派生一个子进程来定期扫描所有已安装的文件系统尝试列出顶级目录，SIGKILL如果它在特定超时内未能响应，则将成功和失败都记录到系统日志中。这避免了某些客户端实现（例如较旧的 Linux）的问题，这些实现从不会触发某些类型的错误超时，NFS 服务器部分响应但例如不会响应诸如listdir等实际调用。

I don't publish them but the included Makefile uses fpmto build rpm and deb packages with an Upstart script.

我不发布它们，但包含的 Makefile 用于使用fpmUpstart 脚本构建 rpm 和 deb 包。

Answer 6

回答by UndeadKernel

Writing a C program that checks for ESTALE is a good option if you don't mind waiting for the command to finish because of the stale file system. If you want to implement a "timeout" option the best way I've found to implement it (in a C program) is to fork a child process that tries to open the file. You then check if the child process has finished reading a file successfully in the filesystem within an allocated amount of time.

如果您不介意等待命令完成，因为文件系统陈旧，那么编写一个检查 ESTALE 的 C 程序是一个不错的选择。如果你想实现一个“超时”选项，我发现实现它的最好方法（在 C 程序中）是派生一个试图打开文件的子进程。然后检查子进程是否在分配的时间内成功完成了文件系统中的文件读取。

Here is a small proof of concept C program to do this:

这是一个小概念的 C 程序证明来做到这一点：

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/wait.h>


void readFile();
void waitForChild(int pid);


int main(int argc, char *argv[])
{
  int pid;

  pid = fork();

  if(pid == 0) {
    // Child process.
    readFile();
  }
  else if(pid > 0) {
    // Parent process.
    waitForChild(pid);
  }
  else {
    // Error
    perror("Fork");
    exit(1);
  }

  return 0;
}

void waitForChild(int child_pid)
{
  int timeout = 2; // 2 seconds timeout.
  int status;
  int pid;

  while(timeout != 0) {
    pid = waitpid(child_pid, &status, WNOHANG);
    if(pid == 0) {
      // Still waiting for a child.
      sleep(1);
      timeout--;
    }
    else if(pid == -1) {
      // Error
      perror("waitpid()");
      exit(1);
    }
    else {
      // The child exited.
      if(WIFEXITED(status)) {
        // Child was able to call exit().
        if(WEXITSTATUS(status) == 0) {
          printf("File read successfully!\n");
          return;
        }
      }
      printf("File NOT read successfully.\n");
      return;
    }
  }

  // The child did not finish and the timeout was hit.
  kill(child_pid, 9);
  printf("Timeout reading the file!\n");
}

void readFile()
{
  int fd;

  fd = open("/path/to/a/file", O_RDWR);
  if(fd == -1) {
    // Error
    perror("open()");
    exit(1);
  }
  else {
    close(fd);
    exit(0);
  }
}

Answer 7

回答by Birgit Ducarroz

Another way, using shell script. Works good for me:

另一种方式，使用shell脚本。对我有用：

#!/bin/bash
# Purpose:
# Detect Stale File handle and remove it
# Script created: July 29, 2015 by Birgit Ducarroz
# Last modification: --
#

# Detect Stale file handle and write output into a variable and then into a file
mounts=`df 2>&1 | grep 'Stale file handle' |awk '{print """" }' > NFS_stales.txt`
# Remove : ‘ and ' characters from the output
sed -r -i 's/://' NFS_stales.txt && sed -r -i 's/‘//' NFS_stales.txt && sed -r -i 's/'//' NFS_stales.txt

# Not used: replace space by a new line
# stales=`cat NFS_stales.txt && sed -r -i ':a;N;$!ba;s/ /\n /g' NFS_stales.txt`

# read NFS_stales.txt output file line by line then unmount stale by stale.
#    IFS='' (or IFS=) prevents leading/trailing whitespace from being trimmed.
#    -r prevents backslash escapes from being interpreted.
#    || [[ -n $line ]] prevents the last line from being ignored if it doesn't end with a \n (since read returns a non-zero exit code when it encounters EOF).

while IFS='' read -r line || [[ -n "$line" ]]; do
    echo "Unmounting due to NFS Stale file handle: $line"
    umount -fl $line
done < "NFS_stales.txt"
#EOF

bash 有没有检测过时 NFS 挂载的好方法

提问by Chen Levy

采纳答案by Teddy

回答by astrostl

回答by Oz123

回答by Costa

回答by Chris Adams

回答by UndeadKernel

回答by Birgit Ducarroz

相关推荐

最近更新

标签

bash 有没有检测过时 NFS 挂载的好方法

提问by Chen Levy

采纳答案by Teddy

回答by astrostl

回答by Oz123

回答by Costa

回答by Chris Adams

回答by UndeadKernel

回答by Birgit Ducarroz

相关推荐

bash 获取第 n 行 STDOUT 的命令

在 bash 脚本仍在运行时强制将输出刷新到文件

使用 Curl 下载文件的 Bash 脚本

bash 使用 ls 和 grep 列出具有某些扩展名的文件

相关推荐

最近更新

标签