如何在 Linux 中跟踪每个文件的 IO 操作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9614184/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 05:02:46  来源:igfitidea点击:

How to trace per-file IO operations in Linux?

linuxfile-iofilesystemstracestrace

提问by Noah Watkins

I need to track readsystem calls for specific files, and I'm currently doing this by parsing the output of strace. Since readoperates on file descriptors I have to keep track of the current mapping between fdand path. Additionally, seekhas to be monitored to keep the current position up-to-date in the trace.

我需要跟踪read特定文件的系统调用,我目前正在通过解析strace. 由于read文件描述符工作我必须保持轨道之间的当前映射的fdpath。此外,seek必须监视以保持跟踪中的当前位置是最新的。

Is there a better way to get per-application, per-file-path IO traces in Linux?

有没有更好的方法来在 Linux 中获取每个应用程序、每个文件路径的 IO 跟踪?

采纳答案by Coren

First, you probably don't need to keep track because mapping between fdand pathis available in /proc/PID/fd/.

首先,您可能不需要跟踪,因为fd和之间的映射path/proc/PID/fd/.

Second, maybe you should use the LD_PRELOAD trick and overload in C open, seekand readsystem call. There are some article hereand thereabout how to overload malloc/free.

其次,也许你应该使用在C LD_PRELOAD技巧和过载openseekread系统调用。这里那里有一些关于如何重载 malloc/free 的文章。

I guess it won't be too different to apply the same kind of trick for those system calls. It needs to be implemented in C, but it should take far less code and be more precise than parsing straceoutput.

我想对那些系统调用应用相同的技巧不会有太大的不同。它需要用 C 来实现,但它需要的代码少得多,而且比解析strace输出更精确。

回答by user1055604

You could wait for the files to be opened so you can learn the fd and attach strace after the process launch like this:

您可以等待文件被打开,以便在进程启动后学习 fd 并附加 strace,如下所示:

strace -p pid-e trace=file -e read=fd

strace -p pid-e trace=file -e read= fd

回答by Johnlcf

I think overloading open, seekand readis a good solution. But just FYI if you want to parse and analyze the strace output programmatically, I did something similar before and put my code in github: https://github.com/johnlcf/Stana/wiki

我认为超载openseek并且read是一个很好的解决方案。但仅供参考,如果您想以编程方式解析和分析 strace 输出,我之前做了类似的事情并将我的代码放在 github 中:https: //github.com/johnlcf/Stana/wiki

(I did that because I have to analyze the strace result of program ran by others, which is not easy to ask them to do LD_PRELOAD.)

(我这样做是因为我要分析别人运行的程序的strace结果,这并不容易让他们做LD_PRELOAD。)

回答by dmeister

systemtap- a kind of DTrace reimplementation for Linux - could be of help here.

systemtap- 一种适用于 Linux 的 DTrace 重新实现 - 在这里可能会有所帮助。

As with strace you only have the fd, but with the scripting ability it is easy to maintain the filename for an fd (unless with fun stuff like dup). There is the example script iotime that illustatesit.

与 strace 一样,您只有 fd,但使用脚本功能可以轻松维护 fd 的文件名(除非有像 dup 这样有趣的东西)。还有的是,示例脚本iotime illustates它。

#! /usr/bin/env stap

/*
 * Copyright (C) 2006-2007 Red Hat Inc.
 * 
 * This copyrighted material is made available to anyone wishing to use,
 * modify, copy, or redistribute it subject to the terms and conditions
 * of the GNU General Public License v.2.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 *
 * Print out the amount of time spent in the read and write systemcall
 * when each file opened by the process is closed. Note that the systemtap 
 * script needs to be running before the open operations occur for
 * the script to record data.
 *
 * This script could be used to to find out which files are slow to load
 * on a machine. e.g.
 *
 * stap iotime.stp -c 'firefox'
 *
 * Output format is:
 * timestamp pid (executabable) info_type path ...
 *
 * 200283135 2573 (cupsd) access /etc/printcap read: 0 write: 7063
 * 200283143 2573 (cupsd) iotime /etc/printcap time: 69
 *
 */

global start
global time_io

function timestamp:long() { return gettimeofday_us() - start }

function proc:string() { return sprintf("%d (%s)", pid(), execname()) }

probe begin { start = gettimeofday_us() }

global filehandles, fileread, filewrite

probe syscall.open.return {
  filename = user_string($filename)
  if ($return != -1) {
    filehandles[pid(), $return] = filename
  } else {
    printf("%d %s access %s fail\n", timestamp(), proc(), filename)
  }
}

probe syscall.read.return {
  p = pid()
  fd = $fd
  bytes = $return
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    fileread[p, fd] += bytes
  time_io[p, fd] <<< time
}

probe syscall.write.return {
  p = pid()
  fd = $fd
  bytes = $return
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    filewrite[p, fd] += bytes
  time_io[p, fd] <<< time
}

probe syscall.close {
  if ([pid(), $fd] in filehandles) {
    printf("%d %s access %s read: %d write: %d\n",
           timestamp(), proc(), filehandles[pid(), $fd],
           fileread[pid(), $fd], filewrite[pid(), $fd])
    if (@count(time_io[pid(), $fd]))
      printf("%d %s iotime %s time: %d\n",  timestamp(), proc(),
             filehandles[pid(), $fd], @sum(time_io[pid(), $fd]))
   }
  delete fileread[pid(), $fd]
  delete filewrite[pid(), $fd]
  delete filehandles[pid(), $fd]
  delete time_io[pid(),$fd]
}

It only works up to a certain number of files because the hash map is size limited.

它只能处理一定数量的文件,因为哈希映射的大小有限。

回答by Shnatsel

Probably the least ugly way to do this is to use fanotify. Fanotify is a Linux kernel facility that allows cheaply watching filesystem events. I'm not sure if it allows filtering by PID, but it does pass the PID to your program so you can check if it's the one you're interested in.

可能最不丑陋的方法是使用 fanotify。Fanotify 是一个 Linux 内核工具,它允许廉价地观看文件系统事件。我不确定它是否允许通过 PID 过滤,但它确实将 PID 传递给您的程序,因此您可以检查它是否是您感兴趣的那个。

Here's a nice code sample: http://bazaar.launchpad.net/~pitti/fatrace/trunk/view/head:/fatrace.c

这是一个不错的代码示例:http: //bazaar.launchpad.net/~pitti/fatrace/trunk/view/head: /fatrace.c

However, it seems to be under-documented at the moment. All the docs I could find are http://www.spinics.net/lists/linux-man/msg02302.htmland http://lkml.indiana.edu/hypermail/linux/kernel/0811.1/01668.html

但是,目前似乎记录不足。我能找到的所有文档都是http://www.spinics.net/lists/linux-man/msg02302.htmlhttp://lkml.indiana.edu/hypermail/linux/kernel/0811.1/01668.html

回答by Shnatsel

Parsing command-line utils like strace is cumbersome; you could use ptrace() syscall instead. See man ptracefor details.

解析像 strace 这样的命令行工具很麻烦;您可以改用 ptrace() 系统调用。详情请参阅man ptrace