Linux 使用 ptrace 跟踪所有子进程之间的 execve() 调用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5317261/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:16:08  来源:igfitidea点击:

Using ptrace to track all execve() calls across children

linuxsystem-callsptrace

提问by Clint O

I am trying to write a tool on Linux CentOS to track all spawned processes and what is run. In essence, I'm interested in walking all fork/clones and emitting all the command-lines from execve(). Strace already does (some of) this, but it also truncates the calls and the arguments. I also wanted to better understand how ptrace()works.

我正在尝试在 Linux CentOS 上编写一个工具来跟踪所有生成的进程和运行的内容。本质上,我有兴趣遍历所有 fork/clone 并从execve(). Strace 已经做了(一些)这个,但它也截断了调用和参数。我还想更好地了解ptrace()工作原理。

So, the first roadblock was figuring out how to use ptrace()to walk a fork/clone without having the tracing program require to fork a copy of itself. I dug in and found out how strace does this. Since fork is implemented with clone on Linux, I noticed that strace pounds some bits into the clone syscall to enable child tracing w/o any extra headache.

因此,第一个障碍是弄清楚如何在ptrace()不让跟踪程序需要分叉自身副本的情况下使用分叉/克隆。我深入研究并发现了 strace 是如何做到这一点的。由于 fork 是在 Linux 上使用 clone 实现的,我注意到 strace 在克隆系统调用中加入了一些位,以启用子跟踪,而无需任何额外的麻烦。

So, in essence the code is just a big:

所以,本质上代码只是一个大的:

while (1) {
    int pid = wait3(-1,...);

    /* process what happened */

    ptrace(PTRACE_SYSCALL, pid,...);
}

This works fine for relatively simple processes like /bin/sh, however, some processes are causing the wait()to hang indefinitely. The only thing I've been able to determine is that the process I'm tracing is performing a sys_rt_sigsuspend()on it's child (so, the tracer's grandchild) and then things wedge.

这适用于相对简单的进程,例如/bin/sh,但是,某些进程会导致wait()无限期挂起。我唯一能够确定的是,我正在跟踪的进程正在sys_rt_sigsuspend()对其子进程(因此,跟踪器的孙子进程)执行 a ,然后是楔子。

I was curious if there's a sane way I can debug what might be happening. Something is clearly preventing the process tree from making forward progress

我很好奇是否有一种理智的方法可以调试可能发生的事情。显然有些东西阻止了流程树向前推进

Here's the source code of the program in question:

这是有问题的程序的源代码:

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>

/* For the clone flags
 */
#include <sched.h>

/* #include <errno.h> */

#include <sys/ptrace.h>
#include <sys/user.h>

/* Defines our syscalls like 
 */
#include <sys/syscall.h>

#include <sys/reg.h>
#include <stdio.h>

#include <signal.h>

#include <ctype.h>

#include <map>

using namespace std;

char bufstr[4096];

#ifdef __x86_64__
#define REG_ACC  RAX
#define REG_ARG1 RDI
#define REG_ARG2 RSI
#else
#define REG_ACC  EAX
#define REG_ARG1 EBX
#define REG_ARG2 ECX
#endif

/* Trace control structure per PID that we're tracking
 */
class tcb {
    int      pid_;
    int entering_;

    public:

    tcb(int pid, int entering = 1) : pid_(pid), entering_(entering) {};
    tcb()                          : pid_(-1)                       {};
    // tcb(const tcb& p)              : pid_(pid.pid()), entering_(entering.entering()) {};
    int&       pid() { return      pid_; }
    int&  entering() { return entering_; }
};

/* Fetch a string from process (pid) at location (ptr).  Buf is the place
 * to store the data with size limit (size).  Return the number of bytes
 * copied.
 */
int get_string(int pid, long ptr, char *buf, int size)
{
    long data;
    char *p = (char *) &data;
    int j = 0;

    while ((data = ptrace(PTRACE_PEEKTEXT, pid, (void *) ptr, 0)) && j < size) {
        int i;

        for (i = 0; i < sizeof(data) && j < size; i++, j++) {
            if (!(buf[j] = p[i]))
                goto done;
        }
        ptr += sizeof(data);
    }

    done:

    buf[j] = '
ptrace(PTRACE_GETSIGINFO, pid, NULL, &sig_data)
'; return j; } int main(int argc, char *argv[]) { int status = 0; long scno = 0; // int entering = 1; struct user_regs_struct regs; map<int, tcb> pidTable; struct sigaction sa; /* Setup */ int pid = fork(); if (!pid && argc) { if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { perror("ptrace(PTRACE_ME,... "); exit(1); } execvp(argv[1], &argv[1]); } else { sa.sa_flags = 0; sa.sa_handler = SIG_DFL; sigemptyset(&sa.sa_mask); sigaction(SIGCHLD, &sa, NULL); waitpid(pid, &status, 0); pidTable[pid] = tcb(pid); fprintf(stderr, "pid is %d\n", pidTable[pid].pid()); while (!pidTable.empty()) { if (pid > 0) { //fprintf(stderr, "%d: Restarting %d\n", getpid(), pid); if (ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0) { perror("ptrace(PTRACE_SYSCALL,..."); exit(1); } } // waitpid(pid, &status, 0); // pid = waitpid(-1, &status, 0); pid = wait3(&status, __WALL, 0); // fprintf(stderr, "Pid from wait is %d\n", pid); if (pid < 0) { perror("waitpid"); break; } else { /* fprintf(stderr, "%d: Status is: ", pid); */ /* if (WIFEXITED(status)) { fprintf(stderr, "exited"); } else if (WIFSIGNALED(status)) { fprintf(stderr, "exited"); } else if (WIFSTOPPED(status), "stopped") { fprintf(stderr, "stopped"); } else if (WIFCONTINUED(status)) { fprintf(stderr, "continued"); } fprintf(stderr, "\n"); */ if (WIFEXITED(status) || WIFSIGNALED(status)) { /* Probably empty the table here */ pidTable.erase(pid); fprintf(stderr, "Detect process term/kill %d\n", pid); /* if (ptrace(PTRACE_DETACH, pid, 0, 0) < 0) { perror("ptrace"); } */ pid = -1; continue; } } ptrace(PTRACE_GETREGS, pid, 0, &regs); #ifdef __x86_64__ scno = regs.orig_rax; #else scno = regs.orig_eax; #endif /* __x86_64__ */ if (scno == SYS_execve) { fprintf(stderr, "%d: Exec branch\n", pid); if (pidTable[pid].entering()) { long ldata, ptr, ptr1; ptrace(PTRACE_GETREGS, pid, 0, &regs); #ifdef __x86_64__ ptr = regs.rdi; #else ptr = regs.ebx; #endif /* __x86_64__ */ fprintf(stderr, "%d: exec(", pid); if (ptr) { get_string(pid, ptr, bufstr, sizeof(bufstr)); fprintf(stderr, "%s", bufstr); } #ifdef __x86_64__ ptr1 = regs.rsi; #else ptr1 = regs.ecx; #endif /* __x86_64__ */ for (; ptr1; ptr1 += sizeof(unsigned long)) { ptr = ptr1; /* Indirect through ptr since we have char *argv[] */ ptr = ptrace(PTRACE_PEEKTEXT, pid, (void *) ptr, 0); if (!ptr) break; get_string(pid, ptr, bufstr, sizeof(bufstr)); fprintf(stderr, ", %s", bufstr); } fprintf(stderr, ")\n"); pidTable[pid].entering() = 0; } else { long acc = ptrace(PTRACE_PEEKUSER, pid, sizeof(unsigned long) * REG_ACC, 0); pidTable[pid].entering() = 1; fprintf(stderr, "%d: Leaving exec: eax is %ld\n", pid, acc); } } else if (scno == SYS_fork || scno == SYS_clone) { fprintf(stderr, "%d: fork/clone branch\n", pid); if (pidTable[pid].entering()) { long flags = ptrace(PTRACE_PEEKUSER, pid, (sizeof(unsigned long) * REG_ARG1), 0); fprintf(stderr, "%d: Entering fork/clone\n", pid); pidTable[pid].entering() = 0; if (ptrace(PTRACE_POKEUSER, pid, (sizeof(unsigned long) * REG_ARG1), flags | CLONE_PTRACE & ~(flags & CLONE_VFORK ? CLONE_VFORK | CLONE_VM : 0)) < 0) { perror("ptrace"); } if (ptrace(PTRACE_POKEUSER, pid, (sizeof(unsigned long) * REG_ARG2), 0) < 0) { perror("ptrace"); } } else { // int child; ptrace(PTRACE_GETREGS, pid, 0, &regs); #ifdef __x86_64__ fprintf(stderr, "%d: Leaving fork/clone: rax = %ld\n", pid, regs.rax); #else fprintf(stderr, "%d: Leaving fork/clone: eax = %ld\n", pid, regs.eax); #endif pidTable[pid].entering() = 1; #ifdef __x86_64__ if (regs.rax <= 0) { #else if (regs.eax <= 0) { #endif continue; } #ifdef __x86_64__ int newpid = regs.rax; #else int newpid = regs.eax; #endif pidTable[newpid] = tcb(newpid, 0); //pidTable[newpid] = tcb(newpid, 1); //pidTable[newpid] = pidTable[pid]; fprintf(stderr, "%d: forked child is %d\n", pid, newpid); } } else if (scno == SYS_exit) { fprintf(stderr, "%d: exit syscall detected\n", pid); } else if (scno < 0) { fprintf(stderr, "Negative syscall number for %d\n", pid); exit(1); } else { fprintf(stderr, "%d: Scno is %ld\n", pid, scno); } } } return 0; }

回答by fche

By the way. strace -f -s99999 -e trace=clone,execveappears to give good-quality results. To see a trace of strace's own actions, you might try systemtap, ie.

顺便一提。strace -f -s99999 -e trace=clone,execve似乎给出了高质量的结果。要查看 strace 自己操作的痕迹,您可以尝试使用 systemtap,即。

# stap -e 'probe syscall.ptrace {if (execname()=="strace") log(argstr)}' -c 'strace COMMAND'

# stap -e 'probe syscall.ptrace {if (execname()=="strace") log(argstr)}' -c 'strace COMMAND'

(Current systemtap doesn't pretty-print the ptrace arguments quite rightly.)

(当前的 systemtap 并没有完全正确地打印 ptrace 参数。)

Or you can strace strace:

或者你可以 strace 跟踪:

strace -e trace=ptrace strace -f -s99999 -e trace=clone,execve COMMAND

strace -e trace=ptrace strace -f -s99999 -e trace=clone,execve COMMAND

回答by osgx

There are flags of ptrace PTRACE_SETOPTIONS subcall: PTRACE_O_TRACEFORK, PTRACE_O_TRACEEXEC, and PTRACE_O_TRACEEXIT. More is at man page of ptrace.

ptrace PTRACE_SETOPTIONS 子调用的标志有:PTRACE_O_TRACEFORK、PTRACE_O_TRACEEXEC 和 PTRACE_O_TRACEEXIT。更多信息在 ptrace 的手册页。

回答by the_JQ

I've experienced the exact same issue, and found the solution by stracing strace.

我遇到了完全相同的问题,并通过跟踪 strace 找到了解决方案。

After you get an event via waitpid(), you gotta call

通过 获得事件后waitpid(),您必须致电

ptrace(PTRACE_SYSCALL, pid, 0, sig)

and if sig_data.si_signois SIGTRAP, you do whatever you currently do, but if not, you need to store the signal number and use it as the last argument to

如果sig_data.si_signoSIGTRAP,则执行当前执行的任何操作,但如果不是,则需要存储信号编号并将其用作最后一个参数

##代码##

That way the signal (in my case SIGCHLD) is properly forwarded to the tracee.

这样信号(在我的情况下SIGCHLD)被正确地转发给被跟踪者。