Linux 无需调试即可找出导致非法指令错误的汇编指令

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10354147/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 06:02:18  来源:igfitidea点击:

Find which assembly instruction caused an Illegal Instruction error without debugging

clinuxassemblyx86-64yasm

提问by pythonic

While running a program I've written in assembly, I get Illegal instructionerror. Is there a way to know which instruction is causing the error, without debugging that is, because the machine I'm running on does not have a debugger or any developement system. In other words, I compile in one machine and run on another. I cannot test my program on the machine I'm compiling because they don't support SSE4.2. The machine I'm running the program on does support SSE4.2 instructions nevertheless.

在运行我用汇编编写的程序时,Illegal instruction出现错误。有没有办法知道是哪条指令导致了错误,而无需调试,也就是说,因为我运行的机器没有调试器或任何开发系统。换句话说,我在一台机器上编译并在另一台机器上运行。我无法在我正在编译的机器上测试我的程序,因为它们不支持 SSE4.2。尽管如此,我运行该程序的机器确实支持 SSE4.2 指令。

I think it maybe because I need to tell the assembler (YASM) to recognize the SSE4.2 instructions, just like we do with gcc by passing it the -msse4.2flag. Or do you think its not the reason? Any idea how to tell YASM to recognize SSE4.2 instructions?

我想这可能是因为我需要告诉汇编器 (YASM) 识别 SSE4.2 指令,就像我们通过向 gcc 传递-msse4.2标志一样。或者你认为这不是原因?知道如何告诉 YASM 识别 SSE4.2 指令吗?

Maybe I should trap the SIGILLsignal and then decode the SA_SIGINFO to see what kind of illegal operation the program does.

也许我应该捕获SIGILL信号,然后解码 SA_SIGINFO 以查看程序进行了什么样的非法操作。

采纳答案by ouah

Actually often you get an illegal instruction error not because your program contain an illegal opcode but because there is a bug in your program (e.g., a buffer overflow) that makes your program jumps in a random address with plain data or in code but not in the start of the opcode.

实际上,您经常会收到非法指令错误,不是因为您的程序包含非法操作码,而是因为您的程序中存在错误(例如,缓冲区溢出),这使您的程序跳转到带有纯数据或代码的随机地址中,而不是在代码中操作码的开始。

回答by unwind

Well ... You can of course insert trace printouts, so you can quickly rule out large areas of the code. Once you've done that, run e.g.

嗯...您当然可以插入跟踪打印输出,这样您就可以快速排除大面积的代码。完成后,运行例如

$ objdump --disassemble my-crashing-program | less

Then jump to e.g. the function you know is causing the error, and read the code, looking for anything that looks odd.

然后跳转到例如你知道导致错误的函数,并阅读代码,寻找任何看起来奇怪的东西。

I'm not totally sure how objdumpdisplays illegal instructions, but they should stand out.

我不完全确定如何objdump显示非法指令,但它们应该脱颖而出。

回答by DigitalRoss

For handwritten assembly I would suspect a stack management problem resulting in a return-to-nowhere. Write a debugging printout routine that saves every register and insert a call to it at the top of every function.

对于手写程序集,我怀疑是堆栈管理问题导致无处可去。编写一个调试打印输出例程,保存每个寄存器并在每个函数的顶部插入对它的调用。

Then you will see how far you get...

然后你会看到你能走多远......

(BTW, a good editor and a good understanding of the assembler's macro syntax are lifesavers when writing machine code.)

(顺便说一句,一个好的编辑器和对汇编器宏语法的很好的理解是编写机器代码时的救星。)

回答by Michael Burr

If you can enable core dumps on that system, just run the program, let it crash, then pull the core dump off the target machine onto your development machine and load it into a GDB built to debug the target architecture - that should tell you exactly where the crash occurred. Just use GDB's corecommand to load the core file into the debugger.

如果您可以在该系统上启用核心转储,只需运行程序,让它崩溃,然后将核心转储从目标机器上拉到您的开发机器上,并将其加载到为调试目标架构而构建的 GDB 中 - 这应该告诉您确切崩溃发生的地方。只需使用 GDB 的core命令将核心文件加载到调试器中即可。

  • To enable core dumps on the target:

    ulimit -c unlimited
    
  • pseudo-files that control how the core file will be named (cat these to see the current configuration, write to them to change the configuration):

    /proc/sys/kernel/core_pattern
    /proc/sys/kernel/core_uses_pid
    
  • 在目标上启用核心转储:

    ulimit -c unlimited
    
  • 控制核心文件命名方式的伪文件(cat这些以查看当前配置,写入它们以更改配置):

    /proc/sys/kernel/core_pattern
    /proc/sys/kernel/core_uses_pid
    

On my system, once core dumps are enabled, a crashing program will write a file simply named "core" in the working directory. That's probably good enough for your purposes, but changing how the core dump file is named lets you keep a history of core dumps if that's necessary (maybe for a more intermittent problem).

在我的系统上,一旦启用核心转储,崩溃的程序将在工作目录中写入一个名为“core”的文件。这对于您的目的来说可能已经足够了,但是更改核心转储文件的命名方式可以让您在必要时保留核心转储的历史记录(可能是针对更间歇性的问题)。

回答by Diego Pino

Recently I experienced a crash due to a 132 exit status code (128 + 4: program interrupted by a signal + illegal instruction signal). Here's how I figured out what instruction was causing the crash.

最近我遇到了由于 132 退出状态码(128 + 4:程序被信号中断 + 非法指令信号)导致的崩溃。这是我如何找出导致崩溃的指令。

First, I enabled core dumps:

首先,我启用了核心转储:

$ ulimit -c unlimited

Interestingly, the folder from where I was running the binary contained a folder named core. I had to tell Linux to add the PID to the core dump:

有趣的是,我运行二进制文件的文件夹包含一个名为core. 我不得不告诉 Linux 将 PID 添加到核心转储:

$ sudo sysctl -w kernel.core_uses_pid=1

Then I run my program and got a core named core.23650. I loaded the binary and the core with gdb.

然后我运行我的程序并得到一个名为core.23650. 我用 gdb 加载了二进制文件和核心。

$ gdb program core.23650

Once I got into gdb, it showed up the following information:

一旦我进入gdb,它就会显示以下信息:

Program terminated with signal SIGILL, Illegal instruction.
#0  0x00007f58e9efd019 in ?? ()

That means my program crashed due to an illegal instruction at 0x00007f58e9efd019address memory. Then I switched to asm layoutto check the last instruction executed:

这意味着我的程序由于0x00007f58e9efd019地址内存中的非法指令而崩溃。然后我切换到asm 布局来检查执行的最后一条指令:

(gdb) layout asm
>|0x7f58e9efd019  vpmaskmovd (%r8),%ymm15,%ymm0
 |0x7f58e9efd01e  vpmaskmovd %ymm0,%ymm15,(%rdi)
 |0x7f58e9efd023  add    
$ cat /proc/cpuinfo | grep avx2
x4,%rdi |0x7f58e9efd027 add ##代码##x0,%rdi

It was instruction vpmaskmovdthat caused the error. Apparently, I was trying to run a program aimed for AVX2 architecture on a system which lacks support for AVX2 instruction set.

是指令vpmaskmovd导致了错误。显然,我试图在一个不支持 AVX2 指令集的系统上运行一个针对 AVX2 架构的程序。

##代码##

Lastly, I confirmed vpmaskmovd is an AVX2 only instruction.

最后,我确认vpmaskmovd 是一条 AVX2 only 指令

回答by Gabriel

Missing a returnstatement at the end of a function can cause this.

缺少return函数末尾的语句会导致这种情况。