C语言这四行棘手的 C 代码背后的概念

Question

提问by codeslayer1

Why does this code give the output C++Sucks? What is the concept behind it?

为什么这段代码会给出输出C++Sucks？它背后的概念是什么？

#include <stdio.h>

double m[] = {7709179928849219.0, 771};

int main() {
    m[1]--?m[0]*=2,main():printf((char*)m);    
}

Test it here.

在这里测试一下。

Answer 1

回答by dasblinkenlight

The number 7709179928849219.0has the following binary representation as a 64-bit double:

该数字7709179928849219.0具有以下二进制表示为 64 位double：

01000011 00111011 01100011 01110101 01010011 00101011 00101011 01000011
+^^^^^^^ ^^^^---- -------- -------- -------- -------- -------- --------

+shows the position of the sign; ^of the exponent, and -of the mantissa (i.e. the value without the exponent).

+显示标志的位置；^指数和-尾数（即没有指数的值）。

Since the representation uses binary exponent and mantissa, doubling the number increments the exponent by one. Your program does it precisely 771 times, so the exponent which started at 1075 (decimal representation of 10000110011) becomes 1075 + 771 = 1846 at the end; binary representation of 1846 is 11100110110. The resultant pattern looks like this:

由于表示使用二进制指数和尾数，数字加倍会使指数增加 1。您的程序精确地执行了 771 次，因此从 1075（的十进制表示10000110011）开始的指数最后变为 1075 + 771 = 1846；1846 的二进制表示是11100110110。结果模式如下所示：

01110011 01101011 01100011 01110101 01010011 00101011 00101011 01000011
-------- -------- -------- -------- -------- -------- -------- --------
0x73 's' 0x6B 'k' 0x63 'c' 0x75 'u' 0x53 'S' 0x2B '+' 0x2B '+' 0x43 'C'

This pattern corresponds to the string that you see printed, only backwards. At the same time, the second element of the array becomes zero, providing null terminator, making the string suitable for passing to printf().

此模式对应于您看到的打印字符串，只是向后显示。同时，数组的第二个元素变为零，提供空终止符，使字符串适合传递给printf()。

Answer 2

回答by Adam Stelmaszczyk

回答by Angew is no longer proud of SO

Disclaimer:This answer was posted to the original form of the question, which mentioned only C++ and included a C++ header. The question's conversion to pure C was done by the community, without input from the original asker.

免责声明：此答案已发布到问题的原始形式，其中仅提及 C++ 并包含 C++ 标头。问题向纯 C 的转换是由社区完成的，没有来自原始提问者的输入。

Formally speaking, it's impossible to reason about this program because it's ill-formed (i.e. it's not legal C++). It violates C++11[basic.start.main]p3:

正式地说，不可能对这个程序进行推理，因为它的格式不正确（即它不是合法的 C++）。它违反了 C++11[basic.start.main]p3：

The function main shall not be used within a program.

函数 main 不得在程序中使用。

This aside, it relies on the fact that on a typical consumer computer, a doubleis 8 bytes long, and uses a certain well-known internal representation. The initial values of the array are computed so that when the "algorithm" is performed, the final value of the first doublewill be such that the internal representation (8 bytes) will be the ASCII codes of the 8 characters C++Sucks. The second element in the array is then 0.0, whose first byte is 0in the internal representation, making this a valid C-style string. This is then sent to output using printf().

除此之外，它依赖于这样一个事实，即在典型的消费者计算机上，a 的double长度为 8 个字节，并使用某种众所周知的内部表示。计算数组的初始值，以便在执行“算法”时，第一个的最终值double将使得内部表示（8 个字节）将是 8 个字符的 ASCII 代码C++Sucks。然后数组中的第二个元素是0.0，其第一个字节0在内部表示中，使其成为有效的 C 样式字符串。然后使用将其发送到输出printf()。

Running this on HW where some of the above doesn't hold would result in garbage text (or perhaps even an access out of bounds) instead.

在上面的一些不成立的硬件上运行它会导致垃圾文本（或者甚至可能是越界访问）。

Answer 4

回答by Jerry Coffin

Perhaps the easiest way to understand the code is to work through things in reverse. We'll start with a string to print out -- for balance, we'll use "C++Rocks". Crucial point: just like the original, it's exactly eight characters long. Since we're going to do (roughly) like the original, and print it out in reverse order, we'll start by putting it in in reverse order. For our first step, we'll just view that bit pattern as a double, and print out the result:

也许理解代码的最简单方法是逆向处理。我们将从要打印的字符串开始——为了平衡，我们将使用“C++Rocks”。关键点：就像原版一样，它正好是八个字符长。由于我们将（大致）像原版一样，并以相反的顺序打印出来，我们将首先以相反的顺序放置它。对于我们的第一步，我们将只将该位模式视为 a double，并打印出结果：

#include <stdio.h>

char string[] = "skcoR++C";

int main(){
    printf("%f\n", *(double*)string);
}

This produces 3823728713643449.5. So, we want to manipulate that in some way that isn't obvious, but is easy to reverse. I'll semi-arbitrarily choose multiplication by 256, which gives us 978874550692723072. Now, we just need to write some obfuscated code to divide by 256, then print out the individual bytes of that in reverse order:

这产生3823728713643449.5. 所以，我们想以某种不明显但很容易逆转的方式来操纵它。我将半任意地选择乘以 256，这给了我们978874550692723072. 现在，我们只需要编写一些混淆代码来除以 256，然后以相反的顺序打印出各个字节：

#include <stdio.h>

double x [] = { 978874550692723072, 8 };
char *y = (char *)x;

int main(int argc, char **argv){
    if (x[1]) {
        x[0] /= 2;  
        main(--x[1], (char **)++y);
    }
    putchar(*--y);
}

Now we have lots of casting, passing arguments to (recursive) mainthat are completely ignored (but evaluation to get the increment and decrement are utterly crucial), and of course that completely arbitrary looking number to cover up the fact that what we're doing is really pretty straightforward.

现在我们有很多强制转换，将参数传递给（递归）main完全被忽略（但是获得增量和减量的评估是非常关键的），当然，这个看起来完全任意的数字来掩盖我们正在做的事情的事实真的很简单。

Of course, since the whole point is obfuscation, if we feel like it we can take more steps as well. Just for example, we can take advantage of short-circuit evaluation, to turn our ifstatement into a single expression, so the body of main looks like this:

当然，由于整点是混淆，如果我们愿意，我们也可以采取更多步骤。举个例子，我们可以利用短路求值，把我们的if语句变成一个单独的表达式，所以 main 的主体看起来像这样：

x[1] && (x[0] /= 2,  main(--x[1], (char **)++y));
putchar(*--y);

To anybody who isn't accustomed to obfuscated code (and/or code golf) this starts to look pretty strange indeed -- computing and discarding the logical andof some meaningless floating point number and the return value from main, which isn't even returning a value. Worse, without realizing (and thinking about) how short-circuit evaluation works, it may not even be immediately obvious how it avoids infinite recursion.

对于任何不习惯混淆代码（和/或代码高尔夫）的人来说，这确实开始看起来很奇怪——计算并丢弃and一些无意义的浮点数的逻辑和来自的返回值main，它甚至没有返回一个价值。更糟糕的是，如果没有意识到（和思考）短路评估是如何工作的，它如何避免无限递归甚至可能不是很明显。

Our next step would probably be to separate printing each character from finding that character. We can do that pretty easily by generating the right character as the return value from main, and printing out what mainreturns:

我们的下一步可能是将打印每个字符与查找该字符分开。我们可以很容易地通过生成正确的字符作为从的返回值main，并打印出main返回的内容：

x[1] && (x[0] /= 2,  putchar(main(--x[1], (char **)++y)));
return *--y;

At least to me, that seems obfuscated enough, so I'll leave it at that.

至少对我来说，这似乎已经够模糊了，所以我就先不谈了。

Answer 5

回答by D.R.

It is just building up a double array (16 bytes) which - if interpreted as a char array - build up the ASCII codes for the string "C++Sucks"

它只是构建了一个双数组（16 个字节），如果被解释为一个字符数组，它会为字符串“C++Sucks”构建 ASCII 代码

However, the code is not working on each system, it relies on some of the following undefined facts:

但是，该代码并非在每个系统上都有效，它依赖于以下一些未定义的事实：

double has exactly 8 bytes
endianness

double 正好有 8 个字节
字节序

Answer 6

回答by Serve Laurijssen

The following code prints C++Suc;C, so the whole multiplication is only for the last two letters

下面的代码打印C++Suc;C，所以整个乘法只针对最后两个字母

double m[] = {7709179928849219.0, 0};
printf("%s\n", (char *)m);

Answer 7

回答by Yu Hao

The others have explained the question pretty thoroughly, I'd like to add a note that this is undefined behavioraccording to the standard.

其他人已经非常彻底地解释了这个问题，我想补充一点，根据标准，这是未定义的行为。

C++11 3.6.1/3 Main function

C++11 3.6.1/3主函数

The function main shall not be used within a program. The linkage (3.5) of main is implementation-defined. A program that defines main as deleted or that declares main to be inline, static, or constexpr is ill-formed. The name main is not otherwise reserved. [ Example: member functions, classes, and enumerations can be called main, as can entities in other namespaces. —end example ]

函数 main 不得在程序中使用。main 的链接 (3.5) 是实现定义的。将 main 定义为已删除或将 main 声明为内联、静态或 constexpr 的程序是格式错误的。名称 main 不以其他方式保留。[ 示例：成员函数、类和枚举可以称为 main，就像其他命名空间中的实体一样。—结束示例]

Answer 8

回答by Hyman Aidley

The code could be re-written like this:

代码可以这样重写：

void f()
{
    if (m[1]-- != 0)
    {
        m[0] *= 2;
        f();
    } else {
          printf((char*)m);
    }
}

What it's doing is producing a set of bytes in the doublearray mthat happen to correspond to the characters 'C++Sucks' followed by a null-terminator. They've obfuscated the code by choosing a double value which when doubled 771 times produces, in the standard representation, that set of bytes with the null terminator provided by the second member of the array.

它正在做的是在double数组m中生成一组字节，这些字节恰好对应于字符 'C++Sucks' 后跟一个空终止符。他们通过选择一个 double 值来混淆代码，当它翻倍 771 次时，在标准表示中，该字节集带有由数组的第二个成员提供的空终止符。

Note that this code wouldn't work under a different endian representation. Also, calling main()is not strictly allowed.

请注意，此代码在不同的字节序表示下不起作用。此外，main()不允许打电话。

Answer 9

回答by Abhishek Ghosh

First we should recall that double precision numbers are stored in the memory in binary format as follows:

首先我们应该记住双精度数以二进制格式存储在内存中，如下所示：

(i) 1 bit for the sign

(i) 1 位用于符号

(ii) 11 bits for the exponent

(ii) 指数的 11 位

(iii) 52 bits for the magnitude

(iii) 大小为 52 位

The order of the bits decrease from (i) to (iii).

位的顺序从 (i) 到 (iii)。

First the decimal fractional number is converted to equivalent fractional binary number and then it is expressed as order of magnitude form in binary.

首先将十进制小数转换为等效的小数二进制数，然后以二进制的数量级形式表示。

So the number 7709179928849219.0becomes

所以数字7709179928849219.0变成

(11011011000110111010101010011001010110010101101000011)base 2


=1.1011011000110111010101010011001010110010101101000011 * 2^52

Now while considering the magnitude bits 1.is neglected as all the order of magnitude method shall start with 1.

现在在考虑幅度位时，1.被忽略，因为所有数量级方法都应从1开始。

So the magnitude part becomes :

所以幅度部分变成：

1011011000110111010101010011001010110010101101000011

Now the power of 2is 52, we need to add biasing number to it as 2^(bits for exponent -1)-1i.e. 2^(11 -1)-1 =1023, so our exponent becomes 52 + 1023 = 1075

现在2的幂是52，我们需要给它加上偏置数为2^(bits for exponent -1)-1即2^(11 -1)-1 =1023，所以我们的指数变成52 + 1023 = 1075

Now our code mutiplies the number with 2, 771times which makes the exponent to increase by 771

现在我们的代码将数字乘以2, 771倍，这使得指数增加了771

So our exponent is (1075+771)= 1846whose binary equivalent is (11100110110)

所以我们的指数是(1075+771)= 1846其二进制等价物是(11100110110)

Now our number is positive so our sign bit is 0.

现在我们的数字是正数，所以我们的符号位是0。

So our modified number becomes :

所以我们修改后的数字变成：

sign bit + exponent+ magnitude (simple concatenation of the bits)

符号位 + 指数 + 幅度（位的简单串联）

0111001101101011011000110111010101010011001010110010101101000011

since m is converted to char pointer we shall split the bit pattern in chunks of 8 from the LSD

由于 m 被转换为字符指针，我们将从 LSD 中以 8 个块分割位模式

01110011 01101011 01100011 01110101 01010011 00101011 00101011 01000011

(whose Hex equivalent is :)

（其十六进制等价物是:)

 0x73 0x6B 0x63 0x75 0x53 0x2B 0x2B 0x43

ASCII CHART Which from the character map as shown is :

ASCII 图表从图中的字符映射中可以看出：

s   k   c   u      S      +   +   C

Now once this has been made m[1] is 0 which means a NULL character

现在一旦完成，m[1] 是 0，这意味着一个 NULL 字符

Now assuming that you run this program on a little-endianmachine( lower order bit is stored in lower address) so pointer m pointer to the lowest address bit and then proceeds by taking up bits in chucks of 8 ( as type casted to char* ) and the printf() stops when encounted 00000000 in the last chunck...

现在假设你在一个小端机器上运行这个程序（低位存储在低地址中）所以指针 m 指针指向最低地址位，然后继续占用 8 个卡盘中的位（作为类型转换为 char* ) 并且 printf() 在最后一个 chunk 中遇到 00000000 时停止...

This code is however not portable.

但是，此代码不可移植。

C语言这四行棘手的 C 代码背后的概念

提问by codeslayer1

回答by dasblinkenlight

回答by Adam Stelmaszczyk

回答by Angew is no longer proud of SO

回答by Jerry Coffin

回答by D.R.

回答by Serve Laurijssen

回答by Yu Hao

回答by Hyman Aidley

回答by Abhishek Ghosh

相关推荐

最近更新

标签

C语言 这四行棘手的 C 代码背后的概念

提问by codeslayer1

回答by dasblinkenlight

回答by Adam Stelmaszczyk

回答by Angew is no longer proud of SO

回答by Jerry Coffin

回答by D.R.

回答by Serve Laurijssen

回答by Yu Hao

回答by Hyman Aidley

回答by Abhishek Ghosh

相关推荐

C语言 使用 Arduino 自定义枚举类型声明

C语言 如何在 C 中使用布尔数据类型？

C语言 将输入的数据存储在数组中

C语言 检查用户是否是 C 中的 root 用户？

相关推荐

最近更新

标签

C语言这四行棘手的 C 代码背后的概念

C语言使用 Arduino 自定义枚举类型声明

C语言如何在 C 中使用布尔数据类型？

C语言将输入的数据存储在数组中

C语言检查用户是否是 C 中的 root 用户？