Linux内核实时调试,它是如何完成的,使用了哪些工具?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4943857/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 02:46:45  来源:igfitidea点击:

Linux kernel live debugging, how it's done and what tools are used?

linuxdebugginglinux-kernelkernel

提问by Shinnok

What are the most common and why not uncommon methods and tools used to do live debugging on the Linux kernel? I know that Linus for eg. is againstthis kind of debugging for the Linux Kernel or it least was and thus nothing much has been done in that sense in those years, but honestly a lot of time has passed since 2000 and i am interested if that mentality has changed regarding the Linux project and what current methods are used to do live debugging on the Linux kernel at the moment(either local or remote)?

用于在 Linux 内核上进行实时调试的最常见的方法和工具是什么?为什么不是不常用的方法和工具?我知道 Linus 例如。为防止这种调试Linux内核,或者至少是,因此什么都没有在这个意义上已经完成的那几年,但说实话2000年以来大量的时间已经过去了,如果这种心态已经改变了有关Linux的我感兴趣项目以及目前使用哪些当前方法在 Linux 内核上进行实时调试(本地或远程)?

References to walkthroughs and tutorials on mentioned techniques and tools are welcome.

欢迎参考上述技术和工具的演练和教程。

采纳答案by Kevin

Another option is to use ICE/JTAG controller, and GDB. This 'hardware' solution is especially used with embedded systems,

另一种选择是使用ICE/JTAG 控制器和 GDB。这种“硬件”解决方案特别适用于嵌入式系统,

but for instance Qemu offers similar features:

但例如 Qemu 提供了类似的功能:

  • start qemu with a gdb 'remote' stub which listens on 'localhost:1234' : qemu -s ...,

  • then with GDB you open the kernel file vmlinuxcompiled with debug information (you can take a look a thismailing list thread where they discuss the unoptimization of the kernel).

  • connect GDB and Qemu: target remote localhost:1234

  • see your livekernel:

    (gdb) where
    #0  cpu_v7_do_idle () at arch/arm/mm/proc-v7.S:77
    #1  0xc0029728 in arch_idle () atarm/mach-realview/include/mach/system.h:36
    #2  default_idle () at arm/kernel/process.c:166
    #3  0xc00298a8 in cpu_idle () at arch/arm/kernel/process.c:199
    #4  0xc00089c0 in start_kernel () at init/main.c:713
    
  • 使用 gdb 'remote' 存根启动 qemu,它侦听 'localhost:1234' : qemu -s ...,

  • 然后使用 GDB 打开vmlinux用调试信息编译的内核文件(你可以看看这个邮件列表线程,他们讨论内核的未优化)。

  • 连接 GDB 和 Qemu: target remote localhost:1234

  • 查看您的实时内核:

    (gdb) where
    #0  cpu_v7_do_idle () at arch/arm/mm/proc-v7.S:77
    #1  0xc0029728 in arch_idle () atarm/mach-realview/include/mach/system.h:36
    #2  default_idle () at arm/kernel/process.c:166
    #3  0xc00298a8 in cpu_idle () at arch/arm/kernel/process.c:199
    #4  0xc00089c0 in start_kernel () at init/main.c:713
    

unfortunately, user-space debugging is not possible so far with GDB (no task list information, no MMU reprogramming to see different process contexts, ...), but if you stay in kernel-space, that's quite convenient.

不幸的是,到目前为止,使用 GDB 无法进行用户空间调试(没有任务列表信息,没有 MMU 重新编程以查看不同的进程上下文,...),但是如果您留在内核空间,那将非常方便。

  • info threadswill give you the list and states of the different CPUs
  • info threads将为您提供不同CPU的列表和状态

EDIT:

编辑:

You can get more details about the procedure in this PDF:

您可以在此 PDF 中获取有关该过程的更多详细信息:

Debugging Linux systems using GDB and QEMU.

使用 GDB 和 QEMU 调试 Linux 系统

回答by Shinnok

According to the wiki, kgdbwas merged into the kernel in 2.6.26which is within the last few years. kgdbis a remote debugger, so you activate it in your kernelthen you attach gdb to it somehow. I say somehow as there seems to be lots of options - see connecting gdb. Given that kgdbis now in the source tree, I'd say going forward this is what you want to be using.

根据维基,在最近几年内kgdb被合并到内核中2.6.26kgdb是一个远程调试器,因此您可以在内核中激活它,然后以某种方式将 gdb 附加到它。我以某种方式说,因为似乎有很多选择 - 请参阅连接 gdb。鉴于kgdb现在在源代码树中,我会说这是您想要使用的。

So it looks like Linus gave in. However, I would emphasize his argument - you should know what you're doing and know the system well. This is kernel land. If anything goes wrong, you don't get segfault, you get anything from some obscure problem later on to the whole system coming down. Here be dragons. Proceed with care, you have been warned.

所以看起来 Linus 屈服了。但是,我要强调他的论点——你应该知道你在做什么并且很好地了解系统。这是内核土地。如果出现任何问题,您不会得到segfault,您会从一些晦涩的问题到整个系统崩溃。这里是龙。小心行事,你已经被警告过。

回答by Brad

Another good tool for "live" debugging is kprobes / dynamic probes.

“实时”调试的另一个好工具是 kprobes / 动态探针。

This lets you dynamically build little tiny modules which run when certain addresses are executed - sort of like a breakpoint.

这使您可以动态构建在执行某些地址时运行的小模块 - 有点像断点。

The big advantage of them are:

它们的最大优点是:

  1. They do not impact the system - i.e. when a location is hit - it just excecutes the code - it doesn't halt the whole kernel.
  2. You don't need two different systems interconnected (target and debug) like with kgdb
  1. 它们不会影响系统——即当一个位置被命中时——它只是执行代码——它不会停止整个内核。
  2. 您不需要像 kgdb 那样互连两个不同的系统(目标和调试)

It is best for doing things like hitting a breakpoint, and seeing what data values are, or checking if things have been changed/overwritten, etc. If you want to "step through code" - it doesn't do that.

最适合执行诸如击中断点、查看数据值是什么、或检查内容是否已更改/覆盖等之类的操作。如果您想“单步执行代码”,则不会这样做。

Addition - 2018:

添加 - 2018 年:

Another very powerful method is a program simply called "perf" which kind of rolls-up many tools (like Dynamic probes) and kind of replaces/depricates others (like oprofile).

另一个非常强大的方法是一个简单地称为“perf”的程序,它可以汇总许多工具(如动态探针)并替换/贬低其他工具(如 oprofile)。

In particular, the perf probecommand can be used to easily create/add dynamic probes to the system, afterwhich perf recordcan sample the system and report info (and backtraces) when the probe is hit for reporting via perf report(or perf script). If you have good debug symbols in the kernel you can get great intel out of the system without even taking the kernel down. Do a man perf(in Google or on your system) for more info on this tool or see this great page on it:

特别是,该perf probe命令可用于轻松地创建/向系统添加动态探测器,之后perf record可以对系统进行采样并报告信息(和回溯),当探测器被命中以通过perf report(或perf script)进行报告时。如果您在内核中有良好的调试符号,您甚至可以在不关闭内核的情况下从系统中获得很好的信息。做一个man perf(在谷歌或你的系统上)以获取有关此工具的更多信息或查看其上的这个很棒的页面:

http://www.brendangregg.com/perf.html

http://www.brendangregg.com/perf.html

回答by mpe

Actually the joke is that Linux has had an in-kernel debugger since 2.2.12, xmon, but only for the powerpcarchitecture (actually it was ppcback then).

实际上,这个笑话是 Linux 从 2.2.12 开始就有了内核调试器xmon,但仅限于powerpc架构(实际上当时是ppc这样)。

It's not a source level debugger, and it's almost entirely undocumented, but still.

它不是源代码级别的调试器,而且几乎完全没有文档记录,但仍然如此。

http://lxr.linux.no/linux-old+v2.2.12/arch/ppc/xmon/xmon.c#L119

http://lxr.linux.no/linux-old+v2.2.12/arch/ppc/xmon/xmon.c#L119

回答by mpe

As someone who writes kernel code a lot I have to say I have never used kgdb, and only rarely use kprobes etc.

作为一个经常编写内核代码的人,我不得不说我从未使用过 kgdb,并且很少使用 kprobes 等。

It is still often the best approach to throw in some strategic printks. In more recent kernels trace_printkis a good way to do that without spamming dmesg.

它通常仍然是投入一些战略的最佳方法printks。在较新的内核中,这trace_printk是一个很好的方法,无需发送垃圾邮件 dmesg。

回答by matt

kgdb and gdb are almost useless for debugging the kernel because the code is so optimised it bears no relation to the orioginal source and many varuiables are optimised out. This makes steppijng , hence stepping through the source is impossible, examining variables is impossible and is therefore aolmost pointles.

kgdb 和 gdb 对于调试内核几乎没有用,因为代码经过优化,与原始源无关,并且许多变量都被优化掉了。这使得 steppijng ,因此单步执行源是不可能的,检查变量是不可能的,因此几乎毫无意义。

Actually it is worse than useless, it actually gives you false infoprmation so detached is the code you are ollooking at to the actual running code.

实际上,它比无用更糟糕,它实际上为您提供了错误的信息,因此您正在查看的代码与实际运行的代码是分离的。

And no, you cant turn off optimisations in the kernel, it doesnt compile.

不,您不能关闭内核中的优化,它不会编译。

I have to say, coming from a windows kernel environment, the lack of decent debugger is anoying, given that there is junk code out there to maintain.

我不得不说,来自 Windows 内核环境,缺乏合适的调试器很烦人,因为那里有垃圾代码需要维护。

回答by Md Mahbubur Rahman

While debugging Linux kernel we can utilize several tools, for example, debuggers (KDB, KGDB), dumping while crashed (LKCD), tracing toolkit (LTT, LTTV, LTTng), custom kernel instruments (dprobes, kprobes). In the following section I tried to summarized most of them, hope these will help.

在调试 Linux 内核时,我们可以使用多种工具,例如调试器(KDB、KGDB)、崩溃时转储(LKCD)、跟踪工具包(LTT、LTTV、LTTng)、自定义内核工具(dprobes、kprobes)。在下面的部分中,我尝试总结了其中的大部分内容,希望这些会有所帮助。

LKCD(Linux Kernel Crash Dump) tool allows the Linux system to write the contents of its memory when a crash occurs. These logs can be further analyzed for the root cause of the crash. Resources regarding LKCD

LKCD(Linux Kernel Crash Dump)工具允许Linux系统在发生崩溃时写入其内存中的内容。这些日志可以进一步分析崩溃的根本原因。有关 LKCD 的资源

Oopswhen kernel detects a problem, it prints an Oops message. Such a message is generated by printk statements in the fault handler (arch/*/kernel/traps.c). A dedicated ring buffer in the kernel being used by the printk statements. Oops contains information like, the CPU where the Oops occurred on, contents of CPU registers, number of Oops, description, stack back trace and others. Resources regarding kernel Oops

哎呀,当内核检测到问题时,它会打印一条哎呀消息。此类消息由故障处理程序 (arch/*/kernel/traps.c) 中的 printk 语句生成。内核中由 printk 语句使用的专用环形缓冲区。Oops 包含诸如发生 Oops 的 CPU、CPU 寄存器的内容、Oops 的数量、描述、堆栈回溯等信息。有关内核 Oops 的资源

Dynamic Probesis one of the popular debugging tool for Linux which developed by IBM. This tool allows the placement of a “probe” at almost any place in the system, in both user and kernel space. The probe consists of some code (written in a specialized, stack-oriented language) that is executed when control hits the given point. Resources regarding Dynamic Probe listed below

Dynamic Probes是 IBM 开发的一种流行的 Linux 调试工具。该工具允许在系统中几乎任何位置放置“探针”,包括用户空间和内核空间。探针由一些代码(用专门的、面向堆栈的语言编写)组成,当控制到达给定点时执行这些代码。下面列出了有关动态探针的资源

Linux Trace Toolkitis a kernel patch and a set of related utilities that allow the tracing of events in the kernel. The trace includes timing information and can create a reasonably complete picture of what happened over a given period of time. Resources of LTT, LTT Viewer and LTT Next Generation

Linux Trace Toolkit是一个内核补丁和一组相关实用程序,允许跟踪内核中的事件。跟踪包括时间信息,可以创建一个合理完整的图片,了解在给定的时间段内发生的事情。LTT、LTT Viewer 和 LTT Next Generation 的资源

MEMWATCHis an open source memory error detection tool. It works by defining MEMWATCH in gcc statement and by adding a header file to our code. Through this we can track memory leaks and memory corruptions. Resources regarding MEMWATCH

MEMWATCH是一个开源的内存错误检测工具。它的工作原理是在 gcc 语句中定义 MEMWATCH 并将头文件添加到我们的代码中。通过这个,我们可以跟踪内存泄漏和内存损坏。关于 MEMWATCH 的资源

ftraceis a good tracing framework for Linux kernel. ftrace traces internal operations of the kernel. This tool included in the Linux kernel in 2.6.27. With its various tracer plugins, ftrace can be targeted at different static tracepoints, such as scheduling events, interrupts, memory-mapped I/O, CPU power state transitions, and operations related to file systems and virtualization. Also, dynamic tracking of kernel function calls is available, optionally restrictable to a subset of functions by using globs, and with the possibility to generate call graphs and provide stack usage. You can find a good tutorial of ftrace at https://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_rostedt.pdf

ftrace是一个很好的 Linux 内核跟踪框架。ftrace 跟踪内核的内部操作。这个工具包含在 2.6.27 的 Linux 内核中。凭借其各种跟踪器插件,ftrace 可以针对不同的静态跟踪点,例如调度事件、中断、内存映射 I/O、CPU 电源状态转换以及与文件系统和虚拟化相关的操作。此外,内核函数调用的动态跟踪是可用的,可以通过使用 globs 选择性地限制到函数的子集,并且可以生成调用图并提供堆栈使用。你可以在https://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_rostedt.pdf找到一个很好的 ftrace 教程

ltraceis a debugging utility in Linux, used to display the calls a user space application makes to shared libraries. This tool can be used to trace any dynamic library function call. It intercepts and records the dynamic library calls which are called by the executed process and the signals which are received by that process. It can also intercept and print the system calls executed by the program.

ltrace是 Linux 中的调试实用程序,用于显示用户空间应用程序对共享库的调用。该工具可用于跟踪任何动态库函数调用。它拦截并记录被执行进程调用的动态库调用以及该进程接收到的信号。它还可以拦截并打印程序执行的系统调用。

KDBis the in-kernel debugger of the Linux kernel. KDB follows simplistic shell-style interface. We can use it to inspect memory, registers, process lists, dmesg, and even set breakpoints to stop in a certain location. Through KDB we can set breakpoints and execute some basic kernel run control (Although KDB is not source level debugger). Several handy resources regarding KDB

KDB是 Linux 内核的内核调试器。KDB 遵循简单的 shell 风格界面。我们可以用它来检查内存、寄存器、进程列表、dmesg,甚至可以设置断点在某个位置停止。通过 KDB,我们可以设置断点并执行一些基本的内核运行控制(尽管 KDB 不是源级调试器)。关于 KDB 的几个方便的资源

KGDBis intended to be used as a source level debugger for the Linux kernel. It is used along with gdb to debug a Linux kernel. Two machines are required for using kgdb. One of these machines is a development machine and the other is the target machine. The kernel to be debugged runs on the target machine. The expectation is that gdb can be used to "break in" to the kernel to inspect memory, variables and look through call stack information similar to the way an application developer would use gdb to debug an application. It is possible to place breakpoints in kernel code and perform some limited execution stepping. Several handy resources regarding KGDB

KGDB旨在用作 Linux 内核的源代码级调试器。它与 gdb 一起用于调试 Linux 内核。使用kgdb需要两台机器。其中一台机器是开发机器,另一台是目标机器。要调试的内核在目标机器上运行。期望 gdb 可用于“闯入”内核以检查内存、变量并查看调用堆栈信息,类似于应用程序开发人员使用 gdb 调试应用程序的方式。可以在内核代码中放置断点并执行一些有限的执行步进。关于 KGDB 的几个方便的资源

回答by unhmble

You guys are wrong, the kgdb still works well for latest kernel, you need to take care of kernel configuration of split image, randomization optimization.

你们错了,kgdb 仍然适用于最新的内核,您需要注意分割图像的内核配置,随机化优化。

kgdb over serial port is useless because no computer today supports DB9 on a motherboard serial port, USB serial port doesn't support the polling mode.

串口上的kgdb是没有用的,因为现在没有电脑支持主板串口上的DB9,USB串口不支持轮询模式。

The new game is kgdboe, following is the log trace:

新游戏是kgdboe,以下是日志跟踪:

following is the host machine, vmlinux is from the target machine

以下是宿主机,vmlinux 来自目标机

root@Thinkpad-T510:~/KGDBOE# gdb vmlinux
Reading symbols from vmlinux...done.
(gdb) target remote udp:192.168.1.22:31337
1077    kernel/debug/debug_core.c: No such file or directory.
(gdb) l oom_kill_process 
828 mm/oom_kill.c: No such file or directory.
(gdb) l oom_kill_process 
828 in mm/oom_kill.c
(gdb) break oom_kill_process
Breakpoint 1 at 0xffffffff8119e0c0: file mm/oom_kill.c, line 833.
(gdb) c
Continuing.
[New Thread 1779]
[New Thread 1782]
[New Thread 1777]
[New Thread 1778]
[New Thread 1780]
[New Thread 1781]
[Switching to Thread 1779]

Thread 388 hit Breakpoint 1, oom_kill_process (oc=0xffffc90000d93ce8, message=0xffffffff82098fbc "Out of memory")
at mm/oom_kill.c:833
833 in mm/oom_kill.c
(gdb) s
834 in mm/oom_kill.c
(gdb) 

On peer target machine, following is how to get it crash and to be captured by host machine

在对等目标机器上,以下是如何让它崩溃并被主机捕获

#swapoff -a
#stress -m 4 --vm-bytes=500m