Linux 应该如何使用strace?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/174942/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How should strace be used?
提问by e-satis
A colleague once told me that the last option when everything has failed to debug on Linux was to use strace.
一位同事曾经告诉我,当一切都无法在 Linux 上调试时,最后一个选择是使用strace。
I tried to learn the science behind this strange tool, but I am not a system admin guru and I didn't really get results.
我试图学习这个奇怪工具背后的科学,但我不是系统管理员,我并没有真正得到结果。
So,
所以,
- What is it exactly and what does it do?
- How and in which cases should it be used?
- How should the output be understood and processed?
- 它究竟是什么,它有什么作用?
- 应该如何以及在哪些情况下使用它?
- 应该如何理解和处理输出?
In brief, in simple words, how does this stuff work?
简而言之,简单地说,这东西是如何工作的?
采纳答案by John Mulder
Strace Overview
strace can be seen as a light weight debugger. It allows a programmer / user to quickly find out how a program is interacting with the OS. It does this by monitoring system calls and signals.
Strace 概述
strace 可以看作是一个轻量级的调试器。它允许程序员/用户快速找出程序如何与操作系统交互。它通过监视系统调用和信号来做到这一点。
Uses
Good for when you don't have source code or don't want to be bothered to really go through it.
Also, useful for your own code if you don't feel like opening up GDB, but are just interested in understanding external interaction.
使用
适合的时候,你没有源代码,或不希望被人打扰,真正通过它去。
此外,如果您不想打开 GDB,而只是对理解外部交互感兴趣,那么这对您自己的代码很有用。
A good little introduction
I ran into this intro to strace use just the other day: strace hello world
一个很好的小介绍,
我前几天遇到了这个关于 strace 使用的介绍:strace hello world
回答by Asaf Bartov
strace lists all system callsdone by the process it's applied to. If you don't know what system calls mean, you won't be able to get much mileage from it.
strace 列出应用它的进程完成的所有系统调用。如果您不知道系统调用是什么意思,那么您将无法从中获益。
Nevertheless, if your problem involves files or paths or environment values, running strace on the problematic program and redirecting the output to a file and then grepping that file for your path/file/env string may help you see what your program is actuallyattempting to do, as distinct from what you expected it to.
然而,如果您的问题涉及文件或路径或环境值,则在有问题的程序上运行 strace 并将输出重定向到一个文件,然后为您的路径/文件/env 字符串 grep 该文件可能会帮助您了解您的程序实际尝试执行的操作做,与您期望的不同。
回答by bltxd
In simple words, strace traces all system calls issued by a program along with their return codes. Think things such as file/socket operations and a lot more obscure ones.
简而言之,strace 跟踪程序发出的所有系统调用及其返回代码。想想诸如文件/套接字操作之类的事情,还有很多更晦涩的事情。
It is most useful if you have some working knowledge of C since here system calls would more accurately stand for standard C library calls.
如果您有一些 C 的工作知识,这是最有用的,因为这里的系统调用更准确地代表标准 C 库调用。
Let's say your program is /usr/local/bin/cough. Simply use:
假设您的程序是 /usr/local/bin/cough。只需使用:
strace /usr/local/bin/cough <any required argument for cough here>
or
或者
strace -o <out_file> /usr/local/bin/cough <any required argument for cough here>
to write into 'out_file'.
写入'out_file'。
All strace output will go to stderr (beware, the sheer volume of it often asks for a redirection to a file). In the simplest cases, your program will abort with an error and you'll be able to see what where its last interactions with the OS in strace output.
所有 strace 输出都将转到 stderr (请注意,它的庞大数量通常要求重定向到文件)。在最简单的情况下,您的程序将因错误而中止,您将能够在 strace 输出中看到它与操作系统的最后一次交互的位置。
More information should be available with:
应提供更多信息:
man strace
回答by Luka Marinko
Strace is a tool that tells you how your application interacts with your operating system.
Strace 是一种工具,可以告诉您应用程序如何与操作系统交互。
It does this by telling you what OS system calls your application uses and with what parameters it calls them.
它通过告诉您应用程序使用的操作系统系统调用以及调用它们的参数来实现此目的。
So for instance you see what files your program tries to open, and weather the call succeeds.
因此,例如,您会看到您的程序尝试打开哪些文件,并确保调用成功。
You can debug all sorts of problems with this tool. For instance if application says that it cannot find library that you know you have installed you strace would tell you where the application is looking for that file.
您可以使用此工具调试各种问题。例如,如果应用程序说它找不到您知道已安装的库,那么 strace 会告诉您应用程序在哪里寻找该文件。
And that is just a tip of the iceberg.
而这只是冰山一角。
回答by terson
Strace stands out as a tool for investigating production systems where you can't afford to run these programs under a debugger. In particular, we have used strace in the following two situations:
Strace 作为调查生产系统的工具脱颖而出,您无法在调试器下运行这些程序。特别是,我们在以下两种情况下使用了strace:
- Program foo seems to be in deadlock and has become unresponsive. This could be a target for gdb; however, we haven't always had the source code or sometimes were dealing with scripted languages that weren't straight-forward to run under a debugger. In this case, you run strace on an already running program and you will get the list of system calls being made. This is particularly useful if you are investigating a client/server application or an application that interacts with a database
- Investigating why a program is slow. In particular, we had just moved to a new distributed file system and the new throughput of the system was very slow. You can specify strace with the '-T' option which will tell you how much time was spent in each system call. This helped to determine why the file system was causing things to slow down.
- 程序 foo 似乎陷入僵局,变得没有响应。这可能是 gdb 的目标;然而,我们并不总是拥有源代码,或者有时要处理在调试器下无法直接运行的脚本语言。在这种情况下,您在已经运行的程序上运行 strace,您将获得正在执行的系统调用列表。如果您正在研究客户端/服务器应用程序或与数据库交互的应用程序,这将特别有用
- 调查程序运行缓慢的原因。特别是我们刚刚迁移到一个新的分布式文件系统,系统的新吞吐量非常慢。你可以用'-T'选项指定strace,它会告诉你每个系统调用花费了多少时间。这有助于确定文件系统导致速度变慢的原因。
For an example of analyzing using strace see my answer to this question.
有关使用 strace 进行分析的示例,请参阅我对此问题的回答。
回答by mohit
strace is a good tool for learning how your program makes various system calls (requests to the kernel) and also reports the ones that have failed along with the error value associated with that failure. Not all failures are bugs. For example, a code that is trying to search for a file may get a ENOENT (No such file or directory) error but that may be an acceptable scenario in the logic of the code.
strace 是学习程序如何进行各种系统调用(对内核的请求)并报告失败的系统调用以及与该失败相关的错误值的好工具。并非所有的失败都是错误。例如,尝试搜索文件的代码可能会收到 ENOENT(没有此类文件或目录)错误,但这在代码逻辑中可能是可接受的情况。
One good use case of using strace is to debug race conditions during temporary file creation. For example a program that may be creating files by appending the process ID (PID) to some predecided string may face problems in multi-threaded scenarios. [A PID+TID (process id + thread id) or a better system call such as mkstemp will fix this].
使用 strace 的一个很好的用例是在临时文件创建期间调试竞争条件。例如,通过将进程 ID (PID) 附加到某些预先确定的字符串来创建文件的程序可能会在多线程场景中遇到问题。[PID+TID(进程 ID + 线程 ID)或更好的系统调用(例如 mkstemp)将解决此问题]。
It is also good for debugging crashes. You may find this (my) article on strace and debugging crashesuseful.
它也有利于调试崩溃。您可能会发现这篇(我的)关于 strace 和调试崩溃的文章很有用。
回答by Marcin
Strace can be used as a debugging tool, or as a primitive profiler.
Strace 可用作调试工具,或用作原始分析器。
As a debugger, you can see how given system calls were called, executed and what they return. This is very important, as it allows you to see not only that a program failed, but WHY a program failed. Usually it's just a result of lousy coding not catching all the possible outcomes of a program. Other times it's just hardcoded paths to files. Without strace you get to guess what went wrong where and how. With strace you get a breakdown of a syscall, usually just looking at a return value tells you a lot.
作为调试器,您可以看到给定的系统调用是如何被调用、执行的以及它们返回的内容。这非常重要,因为它不仅可以让您看到程序失败,还可以看到程序失败的原因。通常这只是糟糕的编码没有捕捉到程序的所有可能结果的结果。其他时候它只是硬编码的文件路径。没有 strace,您就可以猜出哪里出了问题以及如何出问题。使用 strace,您可以获得系统调用的细分,通常仅查看返回值就可以告诉您很多信息。
Profiling is another use. You can use it to time execution of each syscalls individually, or as an aggregate. While this might not be enough to fix your problems, it will at least greatly narrow down the list of potential suspects. If you see a lot of fopen/close pairs on a single file, you probably unnecessairly open and close files every execution of a loop, instead of opening and closing it outside of a loop.
分析是另一种用途。您可以使用它来单独或作为聚合来计时每个系统调用的执行时间。虽然这可能不足以解决您的问题,但它至少会大大缩小潜在嫌疑人的名单。如果您在单个文件上看到许多 fopen/close 对,则您可能在每次执行循环时不必要地打开和关闭文件,而不是在循环外打开和关闭它。
Ltrace is strace's close cousin, also very useful. You must learn to differenciate where your bottleneck is. If a total execution is 8 seconds, and you spend only 0.05secs on system calls, then stracing the program is not going to do you much good, the problem is in your code, which is usually a logic problem, or the program actually needs to take that long to run.
ltrace 是 strace 的近亲,也非常有用。你必须学会区分你的瓶颈在哪里。如果总执行时间是 8 秒,而你只花了 0.05 秒在系统调用上,那么跟踪程序对你没有多大好处,问题出在你的代码上,这通常是一个逻辑问题,或者程序实际上需要需要那么长时间才能运行。
The biggest problem with strace/ltrace is reading their output. If you don't know how the calls are made, or at least the names of syscalls/functions, it's going to be difficult to decipher the meaning. Knowing what the functions return can also be very beneficial, especially for different error codes. While it's a pain to decipher, they sometimes really return a pearl of knowledge; once I saw a situation where I ran out of inodes, but not out of free space, thus all the usual utilities didn't give me any warning, I just couldn't make a new file. Reading the error code from strace's output pointed me in the right direction.
strace/ltrace 的最大问题是读取它们的输出。如果您不知道调用是如何进行的,或者至少不知道系统调用/函数的名称,则很难解读其含义。了解函数返回的内容也非常有益,尤其是对于不同的错误代码。虽然破译起来很痛苦,但它们有时真的会回馈知识的珍珠;一旦我看到我用完了 inode 的情况,但没有用完可用空间,因此所有常用的实用程序都没有给我任何警告,我只是无法创建新文件。从 strace 的输出中读取错误代码为我指明了正确的方向。
回答by Leslie Zhu
strace -tfp PID will monitor the PID process's system calls, thus we can debug/monitor our process/program status.
strace -tfp PID 将监控PID进程的系统调用,因此我们可以调试/监控我们的进程/程序状态。
回答by Jeff Sheffield
I use strace all the time to debug permission issues. The technique goes like this:
我一直使用 strace 来调试权限问题。该技术是这样的:
$ strace -e trace=open,stat,read,write gnome-calculator
Where gnome-calculator
is the command that you want to run.
gnome-calculator
您要运行的命令在哪里。
回答by prosti
I liked some of the answers where it reads strace
checks how you interacts with your operating system.
我喜欢它读取strace
检查您如何与操作系统交互的一些答案。
This is exactly what we can see. The system calls. If you compare strace
and ltrace
the difference is more obvious.
这正是我们可以看到的。系统调用。如果比较strace
和ltrace
区别就更明显了。
$>strace -c cd
Desktop Documents Downloads examples.desktop Music Pictures Public Templates Videos
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 7 read
0.00 0.000000 0 1 write
0.00 0.000000 0 11 close
0.00 0.000000 0 10 fstat
0.00 0.000000 0 17 mmap
0.00 0.000000 0 12 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 2 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 2 ioctl
0.00 0.000000 0 8 8 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 getdents
0.00 0.000000 0 2 2 statfs
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 9 openat
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 93 10 total
On the other hand there is ltrace
that traces functions.
另一方面是ltrace
跟踪功能。
$>ltrace -c cd
Desktop Documents Downloads examples.desktop Music Pictures Public Templates Videos
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
15.52 0.004946 329 15 memcpy
13.34 0.004249 94 45 __ctype_get_mb_cur_max
12.87 0.004099 2049 2 fclose
12.12 0.003861 83 46 strlen
10.96 0.003491 109 32 __errno_location
10.37 0.003303 117 28 readdir
8.41 0.002679 133 20 strcoll
5.62 0.001791 111 16 __overflow
3.24 0.001032 114 9 fwrite_unlocked
1.26 0.000400 100 4 __freading
1.17 0.000372 41 9 getenv
0.70 0.000222 111 2 fflush
0.67 0.000214 107 2 __fpending
0.64 0.000203 101 2 fileno
0.62 0.000196 196 1 closedir
0.43 0.000138 138 1 setlocale
0.36 0.000114 114 1 _setjmp
0.31 0.000098 98 1 realloc
0.25 0.000080 80 1 bindtextdomain
0.21 0.000068 68 1 opendir
0.19 0.000062 62 1 strrchr
0.18 0.000056 56 1 isatty
0.16 0.000051 51 1 ioctl
0.15 0.000047 47 1 getopt_long
0.14 0.000045 45 1 textdomain
0.13 0.000042 42 1 __cxa_atexit
------ ----------- ----------- --------- --------------------
100.00 0.031859 244 total
Although I checked the manuals several time, I haven't found the origin of the name strace
but it is likely system-call trace, since this is obvious.
虽然我查了好几遍手册,但我还没有找到名字的由来,strace
但很可能是系统调用跟踪,因为这很明显。
There are three bigger notes to say about strace
.
有三个更大的音符要说strace
。
Note 1: Both these functions strace
and ltrace
are using the system call ptrace
. So ptrace
system call is effectively how strace
works.
注1:这两个函数strace
和ltrace
正在使用的系统调用ptrace
。所以ptrace
系统调用是有效的strace
工作方式。
The ptrace() system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers. It is primarily used to implement breakpoint debugging and system call tracing.
ptrace() 系统调用提供了一种方法,通过该方法一个进程(“跟踪器”)可以观察和控制另一个进程(“被跟踪者”)的执行,并检查和更改被跟踪者的内存和寄存器。主要用于实现断点调试和系统调用跟踪。
Note 2: There are different parameters you can use with strace
, since strace
can be very verbose. I like to experiment with -c
which is like a summary of things. Based on -c
you can select one system-call like -e trace=open
where you will see only that call. This can be interesting if you are examining what files will be opened during the command you are tracing.
And of course, you can use the grep
for the same purpose but note you need to redirect like this 2>&1 | grep etc
to understand that config files are referenced when the command was issued.
注意 2:您可以使用不同的参数strace
,因为strace
可能非常冗长。我喜欢实验,-c
就像对事物的总结。根据-c
您可以选择一个系统调用,例如-e trace=open
您只会看到该调用。如果您正在检查在您跟踪的命令期间将打开哪些文件,这可能会很有趣。当然,您可以将grep
用于相同目的,但请注意,您需要像这样重定向2>&1 | grep etc
以了解在发出命令时引用了配置文件。
Note 3: I find this very important note. You are not limited to a specific architecture. strace
will blow you mind, since it can trace over binaries of different architectures.
注释 3:我觉得这是非常重要的注释。您不限于特定的架构。strace
会让你大吃一惊,因为它可以追踪不同架构的二进制文件。