C++ 保护可执行文件免受逆向工程?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6481668/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 20:12:50  来源:igfitidea点击:

Protecting executable from reverse engineering?

c++cobfuscationassembly

提问by graphitemaster

I've been contemplating how to protect my C/C++ code from disassembly and reverse engineering. Normally I would never condone this behavior myself in my code; however the current protocol I've been working on must not ever be inspected or understandable, for the security of various people.

我一直在考虑如何保护我的 C/C++ 代码免遭反汇编和逆向工程。通常我不会在我的代码中容忍这种行为;然而,为了各种人的安全,我一直在研究的当前协议绝不能被检查或理解。

Now this is a new subject to me, and the internet is not really resourceful for prevention against reverse engineeringbut rather depicts tons of information on how to reverse engineer

现在这对我来说是一个新主题,互联网在预防逆向工程方面并不是真正的足智多谋,而是描述了大量关于如何逆向工程的信息

Some of the things I've thought of so far are:

到目前为止,我想到的一些事情是:

  • Code injection (calling dummy functions before and after actual function calls)
  • Code obfustication (mangles the disassembly of the binary)
  • Write my own startup routines (harder for debuggers to bind to)

    void startup();  
    int _start()   
    {  
        startup( );  
        exit   (0)   
    }  
    void startup()  
    {  
        /* code here */  
    }
    
  • Runtime check for debuggers (and force exit if detected)

  • Function trampolines

     void trampoline(void (*fnptr)(), bool ping = false)  
     {  
       if(ping)  
         fnptr();  
       else  
         trampoline(fnptr, true);  
     }
    
  • Pointless allocations and deallocations (stack changes a lot)

  • Pointless dummy calls and trampolines (tons of jumping in disassembly output)
  • Tons of casting (for obfuscated disassembly)
  • 代码注入(在实际函数调用前后调用伪函数)
  • 代码混淆(破坏二进制文件的反汇编)
  • 编写我自己的启动例程(调试器更难绑定)

    void startup();  
    int _start()   
    {  
        startup( );  
        exit   (0)   
    }  
    void startup()  
    {  
        /* code here */  
    }
    
  • 调试器的运行时检查(如果检测到则强制退出)

  • 功能蹦床

     void trampoline(void (*fnptr)(), bool ping = false)  
     {  
       if(ping)  
         fnptr();  
       else  
         trampoline(fnptr, true);  
     }
    
  • 无意义的分配和释放(堆栈变化很大)

  • 毫无意义的假电话和蹦床(在反汇编输出中大量跳跃)
  • 大量铸造(用于混淆拆卸)

I mean these are some of the things I've thought of but they can all be worked around and or figured out by code analysts given the right time frame. Is there anything else alternative I have?

我的意思是这些是我想到的一些事情,但它们都可以解决或由代码分析师在正确的时间范围内解决。我还有其他选择吗?

采纳答案by Stephen Canon

What Amber said is exactly right. You can make reverse engineering harder, but you can never prevent it. You should never trust "security" that relies on the prevention of reverse engineering.

琥珀说的一点都没错。您可以使逆向工程更加困难,但您永远无法阻止它。您永远不应该相信依赖于逆向工程预防的“安全性”

That said, the best anti-reverse-engineering techniques that I've seen focused not on obfuscating the code, but instead on breaking the tools that people usually use to understand how code works. Finding creative ways to break disassemblers, debuggers, etc is both likely to be more effective and also more intellectually satisfying than just generating reams of horrible spaghetti code. This does nothing to block a determined attacker, but it does increase the likelihood that J Random Cracker will wander off and work on something easier instead.

也就是说,我所见过的最好的反逆向工程技术不是专注于混淆代码,而是打破人们通常用来理解代码如何工作的工具。与仅仅生成大量可怕的意大利面条式代码相比,寻找创造性的方法来破解反汇编器、调试器等可能更有效,而且在智力上也更令人满意。这对阻止坚定的攻击者没有任何作用,但它确实增加了 J Random Cracker 徘徊并做一些更容易的事情的可能性。

回答by Amber

but they can all be worked around and or figured out by code analysists given the right time frame.

但在适当的时间范围内,它们都可以通过代码分析来解决和/或弄清楚。

If you give people a program that they are able to run, then they will also be able to reverse-engineer it given enough time. That is the nature of programs. As soon as the binary is available to someone who wants to decipher it, you cannot prevent eventual reverse-engineering. After all, the computer has to be able to decipher it in order to run it, and a human is simply a slower computer.

如果你给人们一个他们能够运行的程序,那么他们也将能够在足够的时间内对其进行逆向工程。这就是程序的本质。一旦二进制文件可供想要破译它的人使用,您就无法阻止最终的逆向工程。毕竟,计算机必须能够破译它才能运行它,而人类只是一台速度较慢的计算机。

回答by RyanR

Safe Net Sentinel(formerly Aladdin). Caveats though - their API sucks, documentation sucks, and both of those are great in comparison to their SDK tools.

Safe Net Sentinel(原阿拉丁)。警告 - 他们的 API 很烂,文档很烂,与他们的 SDK 工具相比,这两者都很棒。

I've used their hardware protection method (Sentinel HASP HL) for many years. It requires a proprietary USB key fob which acts as the 'license' for the software. Their SDK encrypts and obfuscates your executable & libraries, and allows you to tie different features in your application to features burned into the key. Without a USB key provided and activated by the licensor, the software can not decrypt and hence will not run. The Key even uses a customized USB communication protocol (outside my realm of knowledge, I'm not a device driver guy) to make it difficult to build a virtual key, or tamper with the communication between the runtime wrapper and key. Their SDK is not very developer friendly, and is quite painful to integrate adding protection with an automated build process (but possible).

我已经使用他们的硬件保护方法(Sentinel HASP HL)很多年了。它需要一个专有的 USB 密钥卡作为软件的“许可证”。他们的 SDK 会加密和混淆您的可执行文件和库,并允许您将应用程序中的不同功能与刻录到密钥中的功能联系起来。如果没有许可方提供并激活的 USB 密钥,软件将无法解密,因此将无法运行。Key 甚至使用定制的 USB 通信协议(在我的知识领域之外,我不是设备驱动程序专家)来使构建虚拟密钥或篡改运行时包装器和密钥之间的通信变得困难。他们的 SDK 对开发人员不是很友好,并且将添加保护与自动构建过程集成起来非常痛苦(但可能)。

Before we implemented the HASP HL protection, there were 7 known pirates who had stripped the dotfuscator 'protections' from the product. We added the HASP protection at the same time as a major update to the software, which performs some heavy calculation on video in real time. As best I can tell from profiling and benchmarking, the HASP HL protection only slowed the intensive calculations by about 3%. Since that software was released about 5 years ago, not one new pirate of the product has been found. The software which it protects is in high demand in it's market segment, and the client is aware of several competitors actively trying to reverse engineer (without success so far). We know they have tried to solicit help from a few groups in Russia which advertise a service to break software protection, as numerous posts on various newsgroups and forums have included the newer versions of the protected product.

在我们实施 HASP HL 保护之前,有 7 个已知的盗版者已经从产品中剥离了 dotfuscator 的“保护”。我们在对软件进行重大更新的同时添加了 HASP 保护,它会实时对视频执行一些繁重的计算。我从分析和基准测试中可以看出,HASP HL 保护仅将密集计算的速度降低了大约 3%。自从该软件在大约 5 年前发布以来,没有发现该产品的新盗版。它所保护的软件在其细分市场中需求量很大,并且客户知道有几个竞争对手正在积极尝试逆向工程(迄今为止没有成功)。我们知道他们试图向俄罗斯的一些团体寻求帮助,这些团体宣传一项破坏软件保护的服务,

Recently we tried their software license solution (HASP SL) on a smaller project, which was straightforward enough to get working if you're already familiar with the HL product. It appears to work; there have been no reported piracy incidents, but this product is a lot lower in demand..

最近,我们在一个较小的项目中尝试了他们的软件许可解决方案 (HASP SL),如果您已经熟悉 HL 产品,该解决方案非常简单,可以开始工作。它似乎有效;没有盗版事件的报告,但该产品的需求量要低得多。

Of course, no protection can be perfect. If someone is sufficiently motivated and has serious cash to burn, I'm sure the protections afforded by HASP could be circumvented.

当然,任何保护都不可能是完美的。如果有人有足够的动力并且有大量的现金可以燃烧,我相信 HASP 提供的保护可以被规避。

回答by old_timer

The best anti disassembler tricks, in particular on variable word length instruction sets are in assembler/machine code, not C. For example

最好的反汇编技巧,特别是在可变字长指令集上是汇编/机器代码,而不是 C。例如

CLC
BCC over
.byte 0x09
over:

The disassembler has to resolve the problem that a branch destination is the second byte in a multi byte instruction. An instruction set simulator will have no problem though. Branching to computed addresses, which you can cause from C, also make the disassembly difficult to impossible. Instruction set simulator will have no problem with it. Using a simulator to sort out branch destinations for you can aid the disassembly process. Compiled code is relatively clean and easy for a disassembler. So I think some assembly is required.

反汇编器必须解决分支目标是多字节指令中的第二个字节的问题。不过,指令集模拟器没有问题。分支到计算地址(您可以从 C 引起)也使反汇编变得困难甚至不可能。指令集模拟器不会有问题。使用模拟器为您整理分支目标可以帮助反汇编过程。编译后的代码对于反汇编程序来说相对干净和容易。所以我认为需要一些组装。

I think it was near the beginning of Michael Abrash's Zen of Assembly Language where he showed a simple anti disassembler and anti-debugger trick. The 8088/6 had a prefetch queue what you did was have an instruction that modified the next instruction or a couple ahead. If single stepping then you executed the modified instruction, if your instruction set simulator did not simulate the hardware completely, you executed the modified instruction. On real hardware running normally the real instruction would already be in the queue and the modified memory location wouldnt cause any damage so long as you didnt execute that string of instructions again. You could probably still use a trick like this today as pipelined processors fetch the next instruction. Or if you know that the hardware has a separate instruction and data cache you can modify a number of bytes ahead if you align this code in the cache line properly, the modified byte will not be written through the instruction cache but the data cache, and an instruction set simulator that did not have proper cache simulators would fail to execute properly. I think software only solutions are not going to get you very far.

我认为这是在 Michael Abrash 的汇编语言禅的开头附近,他展示了一个简单的反反汇编和反调试技巧。8088/6 有一个预取队列,你所做的是有一条指令修改了下一条或两条指令。如果单步执行,则执行修改后的指令,如果您的指令集模拟器没有完全模拟硬件,则执行修改后的指令。在正常运行的真实硬件上,真正的指令已经在队列中,只要您不再次执行该指令串,修改后的内存位置就不会造成任何损坏。当流水线处理器获取下一条指令时,您今天可能仍然可以使用这样的技巧。或者,如果您知道硬件有单独的指令和数据缓存,如果您在缓存行中正确对齐此代码,则可以提前修改多个字节,修改后的字节将不会通过指令缓存写入,而是通过数据缓存写入,并且没有适当缓存模拟器的指令集模拟器将无法正确执行。我认为纯软件解决方案不会让你走得很远。

The above are old and well known, I dont know enough about the current tools to know if they already work around such things. The self modifying code can/will trip up the debugger, but the human can/will narrow in on the problem and then see the self modifying code and work around it.

以上是旧的和众所周知的,我对当前的工具知之甚少,不知道它们是否已经解决了这些问题。自修改代码可以/将导致调试器失败,但人类可以/将缩小问题范围,然后查看自修改代码并解决它。

It used to be that the hackers would take about 18 months to work something out, dvds for example. Now they are averaging around 2 days to 2 weeks (if motivated) (blue ray, iphones, etc). That means to me if I spend more than a few days on security, I am likely wasting my time. The only real security you will get is through hardware (for example your instructions are encrypted and only the processor core well inside the chip decrypts just before execution, in a way that it cannot expose the decrypted instructions). That might buy you months instead of days.

过去,黑客需要大约 18 个月的时间才能解决问题,例如 DVD。现在他们平均大约需要 2 天到 2 周(如果有动力的话)(蓝光、iphone 等)。这对我来说意味着,如果我在安全上花费超过几天的时间,我很可能会浪费我的时间。您将获得的唯一真正安全性是通过硬件(例如,您的指令是加密的,并且只有芯片内部的处理器内核在执行前才解密,这样就无法公开解密的指令)。这可能会让你买几个月而不是几天。

Also, read Kevin Mitnick's book The Art of Deception. A person like that could pick up a phone and have you or a coworker hand out the secrets to the system thinking it is a manager or another coworker or hardware engineer in another part of the company. And your security is blown. Security is not all about managing the technology, gotta manage the humans too.

另外,请阅读凯文·米特尼克 (Kevin Mitnick) 的著作《欺骗的艺术》。这样的人可以拿起电话,让您或同事将秘密分发给系统,认为它是公司其他部门的经理或其他同事或硬件工程师。你的安全性被破坏了。安全不仅仅是管理技术,还必须管理人员。

回答by Phil

Take, for example, the AES algorithm. It's a very, very public algorithm, and it is VERY secure. Why? Two reasons: It's been reviewed by lots of smart people, and the "secret" part is not the algorithm itself - the secret part is the key which is one of the inputs to the algorithm. It's a much better approach to design your protocol with a generated "secret" that is outside your code, rather than to make the code itself secret. The code can always be interpreted no matter what you do, and (ideally) the generated secret can only be jeopardized by a massive brute force approach or through theft.

AES 算法为例。这是一个非常非常公开的算法,而且非常安全。为什么?两个原因:它已经被很多聪明人过,“秘密”部分不是算法本身——秘密部分是密钥,它是算法的输入之一。使用在代码之外生成的“秘密”来设计协议是一种更好的方法,而不是使代码本身保密。无论您做什么,代码始终可以被解释,并且(理想情况下)生成的秘密只能通过大规模蛮力方法或盗窃来危害。

I think an interesting question is "Whydo you want to obfuscate your code?" You want to make it hard for attackers to crack your algorithms? To make it harder for them to find exploitable bugs in your code? You wouldn't need to obfuscate code if the code were uncrackable in the first place. The root of the problem is crackable software. Fix the root of your problem, don't just obfuscate it.

我认为一个有趣的问题是“你为什么要混淆你的代码?” 您想让攻击者难以破解您的算法吗?为了让他们更难在您的代码中找到可利用的错误?如果代码首先是不可破解的,您就不需要混淆代码。问题的根源在于可破解的软件。解决问题的根源,不要只是混淆视听。

Also, the more confusing you make your code, the harder it will be for YOU to find security bugs. Yes, it will be hard for hackers, but you need to find bugs too. Code should be easy to maintain years from now, and even well-written clear code can be difficult to maintain. Don't make it worse.

此外,您编写的代码越混乱,您就越难找到安全漏洞。是的,这对黑客来说很难,但你也需要找到漏洞。代码应该很容易在几年后维护,即使写得很好的清晰代码也很难维护。不要让它变得更糟。

回答by Gilles 'SO- stop being evil'

Making code difficult to reverse-engineer is called code obfuscation.

使代码难以逆向工程被称为代码混淆。

Most of the techniques you mention are fairly easy to work around. They center on adding some useless code. But useless code is easy to detect and remove, leaving you with a clean program.

您提到的大多数技术都很容易解决。他们专注于添加一些无用的代码。但是无用的代码很容易检测和删除,从而为您留下一个干净的程序。

For effective obfuscation, you need to make the behavior of your program dependent on the useless bits being executed. For example, rather than doing this:

为了有效地进行混淆,您需要使程序的行为依赖于正在执行的无用位。例如,而不是这样做:

a = useless_computation();
a = 42;

do this:

做这个:

a = complicated_computation_that_uses_many_inputs_but_always_returns_42();

Or instead of doing this:

或者不这样做:

if (running_under_a_debugger()) abort();
a = 42;

Do this (where running_under_a_debuggershould not be easily identifiable as a function that tests whether the code is running under a debugger — it should mix useful computations with debugger detection):

执行此操作(其中running_under_a_debugger不应轻易识别为测试代码是否在调试器下运行的函数——它应该将有用的计算与调试器检测混合在一起):

a = 42 - running_under_a_debugger();

Effective obfuscation isn't something you can do purely at the compilation stage. Whatever the compiler can do, a decompiler can do. Sure, you can increase the burden on the decompilers, but it's not going to go far. Effective obfuscation techniques, inasmuch as they exist, involve writing obfuscated source from day 1. Make your code self-modifying. Litter your code with computed jumps, derived from a large number of inputs. For example, instead of a simple call

有效的混淆不是纯粹在编译阶段可以做的事情。无论编译器能做什么,反编译器都能做。当然,您可以增加反编译器的负担,但不会走得太远。有效的混淆技术,就其存在而言,涉及从第一天开始编写混淆源代码。使您的代码能够自我修改。用从大量输入衍生的计算跳转来散布您的代码。例如,而不是简单的调用

some_function();

do this, where you happen to know the exact expected layout of the bits in some_data_structure:

这样做,你碰巧知道位的确切预期布局some_data_structure

goto (md5sum(&some_data_structure, 42) & 0xffffffff) + MAGIC_CONSTANT;

If you're serious about obfuscation, add several months to your planning; obfuscation doesn't come cheap. And do consider that by far the best way to avoid people reverse-engineering your code is to make it useless so that they don't bother. It's a simple economic consideration: they will reverse-engineer if the value to them is greater than the cost; but raising their cost also raises your cost a lot, so try lowering the value to them.

如果您对混淆很认真,请在您的计划中增加几个月;混淆并不便宜。并且确实要考虑到,到目前为止,避免人们对您的代码进行逆向工程的最佳方法是使其无用,以免他们打扰。这是一个简单的经济考虑:如果对他们的价值大于成本,他们就会进行逆向工程;但是提高他们的成本也会大大增加你的成本,所以试着降低他们的价值。

Now that I've told you that obfuscation is hardand expensive, I'm going to tell you it's not for you anyway. You write

既然我已经告诉过您混淆既困难又昂贵,我要告诉您无论如何它都不适合您。你写

current protocol I've been working on must not ever be inspected or understandable, for the security of various people

为了各种人的安全,我一直在研究的当前协议绝不能被检查或理解

That raises a red flag. It's security by obscurity, which has a very poor record. If the security of the protocol depends on people not knowing the protocol, you've lost already.

这引发了一个危险信号。这是默默无闻安全性,其记录非常糟糕。如果协议的安全性取决于不了解协议的人,那么您已经输了

Recommended reading:

推荐阅读:

回答by iammilind

Many a times, fear of your product getting reverse engineered is misplaced. Yes, it can get reverse engineered; but will it become so famous over a short period of time, that hackers will find it worth to reverse engg. it ?(this job is not a small time activity, for substantial lines of code).

很多时候,担心你的产品被逆向工程是错误的。是的,它可以被逆向工程;但是它会不会在短时间内变得如此出名,以至于黑客会发现逆向 engg 是值得的。它 ?(对于大量代码行,这项工作不是一个小的时间活动)。

If it really becomes a money earner, then you should have gathered enough money to protect it using the legal ways like, patent and/or copyrights.

如果它真的成为赚钱者,那么您应该收集足够的钱来使用专利和/或版权等合法方式来保护它。

IMHO, take the basic precautions you are going to take and release it. If it becomes a point of reverse engineering that means you have done a really good job, you yourself will find better ways to overcome it. Good luck.

恕我直言,采取您将要采取的基本预防措施并释放它。如果它成为逆向工程的一个点,这意味着你做得非常好,你自己就会找到更好的方法来克服它。祝你好运。

回答by asmeurer

Take a read of http://en.wikipedia.org/wiki/Security_by_obscurity#Arguments_against. I'm sure others could probably also give a better sources of why security by obscurity is a bad thing.

阅读http://en.wikipedia.org/wiki/Security_by_obscurity#Arguments_against。我相信其他人也可能会提供更好的来源,说明为什么默默无闻的安全是一件坏事。

It should be entirely possible, using modern cryptographic techniques, to have your system be open (I'm not saying it shouldbe open, just that it could be), and still have total security, so long as the cryptographic algorithm doesn't have a hole in it (not likely if you choose a good one), your private keys/passwords remain private, and you don't have security holes in your code (thisis what you should be worrying about).

使用现代加密技术应该完全有可能让您的系统开放(我不是说它应该开放,只是它可能是开放的),并且仍然具有完全的安全性,只要加密算法不有一个漏洞(如果您选择一个好的,则不太可能),您的私钥/密码保持私密,并且您的代码中没有安全漏洞(是您应该担心的)。

回答by tne

Since July 2013, there is renewed interest in cryptographically robust obfuscation (in the form of Indistinguishability Obfuscation) which seems to have spurred from original research from Amit Sahai.

自 2013 年 7 月以来,人们对加密强大的混淆(以不可区分性混淆的形式)重新产生了兴趣,这似乎是从Amit Sahai 的原始研究中激发出来的。

You can find some distilled information in this Quanta Magazine articleand in that IEEE Spectrum article.

您可以在这篇Quanta Magazine 文章那篇 IEEE Spectrum 文章中找到一些精炼的信息。

Currently the amount of resources required to make use of this technique make it impractical, but AFAICT the consensus is rather optimistic about the future.

目前,使用这种技术所需的资源量使其不切实际,但 AFAICT 的共识对未来相当乐观。

I say this very casually, but to everyone who's used to instinctively dismiss obfuscation technology -- this is different.If it's proven to be truly working and made practical, this is major indeed, and not just for obfuscation.

我这么说很随意,但对于习惯于本能地拒绝混淆技术的每个人来说——这是不同的。如果它被证明确实有效并实用,那么这确实很重要,而不仅仅是为了混淆。

回答by Mohammad Alaggan

There is a recent paper called "Program obfuscation and one-time programs". If you are really serious about protecting your application. The paper in general goes around the theoretical impossibility results by the use of simple and universal hardware.

最近有一篇论文叫做“程序混淆和一次性程序”。如果您真的很想保护您的应用程序。本文一般通过使用简单和通用的硬件来解决理论上不可能的结果。

If you cant afford requiring extra hardware, then there is also another paper that gives the theoretically best-possible obfuscation "On best-possible obfuscation", amongst all programs with the same functionality and same size. However the paper shows that information-theoretic best-possible implies a collapse of the polynomial hierarchy.

如果您负担不起需要额外的硬件,那么还有另一篇论文给出了理论上最可能的混淆“ On best-possible obfuscation”,在所有具有相同功能和相同大小的程序中。然而,该论文表明,信息论最佳可能意味着多项式层次结构的崩溃。

Those papers should at least give you sufficient bibliographical leads to walk in the related literature if these results does not work for your needs.

如果这些结果不能满足您的需要,这些论文至少应该为您提供足够的参考书目信息,以便您查阅相关文献。

Update: A new notion of obfuscation, called indistinguishable obfuscation, can mitigate the impossibility result (paper)

更新:一种新的混淆概念,称为不可区分的混淆,可以减轻不可能的结果(论文)