C语言如何利用 Format-String 漏洞？

Question

提问by Atul Goyal

I was reading about vulnerabilities in code and came across this Format-String Vulnerability.

我正在阅读有关代码中的漏洞并遇到了这个Format-String Vulnerability。

Format string bugs most commonly appear when a programmer wishes to print a string containing user supplied data. The programmer may mistakenly write printf(buffer) instead of printf("%s", buffer). The first version interprets buffer as a format string, and parses any formatting instructions it may contain. The second version simply prints a string to the screen, as the programmer intended.

当程序员希望打印包含用户提供的数据的字符串时，格式字符串错误最常出现。程序员可能会错误地写 printf(buffer) 而不是 printf("%s", buffer)。第一个版本将缓冲区解释为格式字符串，并解析它可能包含的任何格式化指令。第二个版本只是按照程序员的意图将字符串打印到屏幕上。

I got the problem with printf(buffer) version, but I still didn't get how this vulnerability can be used by attacker to execute harmful code. Can someone please tell me how this vulnerability can be exploitedby an example?

我遇到了 printf(buffer) 版本的问题，但我仍然不明白攻击者如何利用这个漏洞来执行有害代码。有人可以告诉我如何通过示例利用此漏洞吗？

Answer 1

回答by Michael Foukarakis

You may be able to exploit a format string vulnerability in many ways, directly or indirectly. Let's use the following as an example (assuming no relevant OS protections, which is very rare anyways):

您可以通过多种方式直接或间接地利用格式字符串漏洞。让我们以以下为例（假设没有相关的操作系统保护，这无论如何都非常罕见）：

int main(int argc, char **argv)
{
    char text[1024];
    static int some_value = -72;

    strcpy(text, argv[1]); /* ignore the buffer overflow here */

    printf("This is how you print correctly:\n");
    printf("%s", text);
    printf("This is how not to print:\n");
    printf(text);

    printf("some_value @ 0x%08x = %d [0x%08x]", &some_value, some_value, some_value);
    return(0);
}

The basis of this vulnerability is the behaviour of functions with variable arguments. A function which implements handling of a variable number of parameters has to read them from the stack, essentially. If we specify a format string that will make printf()expect two integers on the stack, and we provide only one parameter, the second one will have to be something else on the stack. By extension, and if we have control over the format string, we can have the two most fundamental primitives:

此漏洞的基础是具有可变参数的函数的行为。实现处理可变数量参数的函数本质上必须从堆栈中读取它们。如果我们指定一个格式字符串，它将printf()在堆栈上产生两个整数，并且我们只提供一个参数，那么第二个参数必须是堆栈上的其他参数。通过扩展，如果我们可以控制格式字符串，我们可以拥有两个最基本的原语：

Reading from arbitrary memory addresses

从任意内存地址读取

[EDIT] IMPORTANT:I'm making some assumptions about the stack frame layout here. You can ignore them if you understand the basic premise behind the vulnerability, and they vary across OS, platform, program and configuration anyways.

[编辑]重要提示：我在这里对堆栈框架布局做了一些假设。如果您了解漏洞背后的基本前提，则可以忽略它们，并且它们因操作系统、平台、程序和配置而异。

It's possible to use the %sformat parameter to read data. You can read the data of the original format string in printf(text), hence you can use it to read anything off the stack:

可以使用%s格式参数来读取数据。您可以在中读取原始格式字符串的数据printf(text)，因此您可以使用它从堆栈中读取任何内容：

./vulnerable AAAA%08x.%08x.%08x.%08x
This is how you print correctly:
AAAA%08x.%08x.%08x.%08x
This is how not to print:
AAAA.XXXXXXXX.XXXXXXXX.XXXXXXXX.41414141
some_value @ 0x08049794 = -72 [0xffffffb8]

Writing to arbitrary memory addresses

写入任意内存地址

You can use the %nformat specifier to write to an arbitrary address (almost). Again, let's assume our vulnerable program above, and let's try changing the value of some_value, which is located at 0x08049794, as seen above:

您可以使用%n格式说明符写入任意地址（几乎）。再一次，让我们假设我们上面的易受攻击的程序，让我们尝试改变的值some_value，它位于0x08049794，如上所示：

./vulnerable $(printf "\x94\x97\x04\x08")%08x.%08x.%08x.%n
This is how you print correctly:
??%08x.%08x.%08x.%n
This is how not to print:
??XXXXXXXX.XXXXXXXX.XXXXXXXX.
some_value @ 0x08049794 = 31 [0x0000001f]

We've overwritten some_valuewith the number of bytes written before the %nspecifier was encountered (man printf). We can use the format string itself, or field width to control this value:

我们已经覆盖some_value了%n遇到说明符 ( man printf)之前写入的字节数。我们可以使用格式字符串本身或字段宽度来控制此值：

./vulnerable $(printf "\x94\x97\x04\x08")%x%x%x%n
This is how you print correctly:
??%x%x%x%n
This is how not to print:
??XXXXXXXXXXXXXXXXXXXXXXXX
some_value @ 0x08049794 = 21 [0x00000015]

There are many possibilities and tricks to try (direct parameter access, large field width making wrap-around possible, building your own primitives), and this just touches the tip of the iceberg. I would suggest reading more articles on fmt string vulnerabilities (Phrack has some mostly excellent ones, although they may be a little advanced) or a book which touches on the subject.

有很多可能性和技巧可以尝试（直接参数访问、大字段宽度使环绕成为可能，构建您自己的原语），而这只是冰山一角。我建议阅读更多关于 fmt 字符串漏洞的文章（Phrack 有一些非常好的文章，虽然它们可能有点高级）或一本涉及该主题的书。

Disclaimer: the examples are taken [although not verbatim] from the book Hacking: The art of exploitation (2nd ed)by Jon Erickson.

免责声明：这些示例 [尽管不是逐字逐句] 摘自Jon Erickson所著的Hacking: The art of Exploit (2nd ed)一书中。

Answer 2

回答by Jonathan Leffler

It is interesting that no-one has mentioned the n$notation supported by POSIX. If you can control the format string as the attacker, you can use notations such as:

有趣的是，没有人提到n$POSIX 支持的符号。如果您可以像攻击者一样控制格式字符串，则可以使用以下符号：

"%200$p"

to read the 200^thitem on the stack (if there is one). The intention is that you should list all the n$numbers from 1 to the maximum, and it provides a way of resequencing how the parameters appear in a format string, which is handy when dealing with I18N (L10N, G11N, M18N^*).

读取堆栈上的第 200^个项目（如果有）。目的是您应该列出n$从 1 到最大值的所有数字，并且它提供了一种重新排序参数在格式字符串中的显示方式的方法，这在处理 I18N (L10N, G11N, M18N ^*)时很方便。

However, some (probably most) systems are somewhat lackadaisical about how they validate the n$values and this can lead to abuse by attackers who can control the format string. Combined with the %nformat specifier, this can lead to writing at pointer locations.

然而，一些（可能是大多数）系统在如何验证这些n$值方面有些乏味，这可能会导致可以控制格式字符串的攻击者滥用。结合%n格式说明符，这可能导致在指针位置写入。

^*The acronyms I18N, L10N, G11N and M18N are for internationalization, localization, globalization, and multinationalization respectively. The number represents the number of omitted letters.

^*I18N、L10N、G11N和M18N分别代表国际化、本地化、全球化和跨国化。数字代表省略字母的数量。

Answer 3

回答by user541686

Ah, the answer is in the article!

啊，答案在文章里！

Uncontrolled format string is a type of software vulnerability, discovered around 1999, that can be used in security exploits. Previously thought harmless, format string exploits can be used to crash a programor to execute harmful code.
A typical exploit uses a combination of these techniques to force a program to overwrite the address of a library function or the return address on the stack with a pointer to some malicious shellcode. The padding parameters to format specifiers are used to control the number of bytes output and the %xtoken is used to pop bytes from the stack until the beginning of the format string itself is reached. The start of the format string is crafted to contain the address that the %nformat token can then overwrite with the address of the malicious code to execute.

不受控制的格式字符串是一种软件漏洞，发现于 1999 年左右，可用于安全漏洞。以前认为无害的格式字符串漏洞可用于使程序崩溃或执行有害代码。
典型的漏洞利用使用这些技术的组合来强制程序用指向某些恶意 shellcode 的指针覆盖库函数的地址或堆栈上的返回地址。格式说明符的填充参数用于控制输出的字节数，%x令牌用于从堆栈中弹出字节，直到到达格式字符串本身的开头。格式字符串的开始精雕细琢包含地址的%n格式令牌可以用恶意代码执行的地址覆盖。

This is because %ncauses printfto writedata to a variable, which is on the stack. But that means it could write to something arbitrarily. All you need is for someone to use that variable (it's relatively easy if it happens to be a function pointer, whose value you just figured out how to control) and they can make you execute anything arbitrarily.

这是因为%n起因printf于写数据到一个变量，它是在堆栈中。但这意味着它可以任意写入某些内容。您所需要的只是让某人使用该变量（如果它恰好是一个函数指针，则相对容易，您刚刚想出了如何控制其值），并且它们可以让您任意执行任何操作。

Take a look at the links in the article; they look interesting.

看看文章中的链接；他们看起来很有趣。

Answer 4

回答by AndreyP

I would recommend reading thislecture note about format string vulnerability. It describes in details what happens and how, and has some images that might help you to understand the topic.

我建议阅读这个有关格式字符串漏洞讲义。它详细描述了发生了什么以及如何发生，并提供了一些可能有助于您理解该主题的图像。

Answer 5

回答by user541686

AFAIK it's mainly because it can crash your program, which is considered to be a denial-of-service attack. All you need is to give an invalid address (practically anythingwith a few %s's is guaranteed to work), and it becomes a simple denial-of-service (DoS) attack.

AFAIK 主要是因为它会使您的程序崩溃，这被认为是拒绝服务攻击。你所需要的只是提供一个无效的地址（实际上任何带有几个%s's 的东西都可以保证工作），它就变成了一种简单的拒绝服务 (DoS) 攻击。

Now, it's theoretically possiblefor that to trigger anything in the case of an exception/signal/interrupt handler, but figuring out how to do that is beyond me -- you need to figure out how to writearbitrary data to memory as well.

现在，理论上有可能在异常/信号/中断处理程序的情况下触发任何事情，但是弄清楚如何做到这一点超出了我的范围——您还需要弄清楚如何将任意数据写入内存。

But why does anyone care if the program crashes, you might ask? Doesn't that just inconvenience the user (who deserves it anyway)?

但是你可能会问，为什么有人关心程序是否崩溃？这不只是给用户带来不便（无论如何都是应得的）？

The problem is that some programs are accessed by multiple users, so crashing them has a non-negligible cost. Or sometimes they're critical to the running of the system (or maybe they're in the middle of doing something very critical), in which case this can be damaging to your data. Of course, if you crash Notepad then no one might care, but if you crash CSRSS (which I believe actually had a similar kind of bug -- a double-free bug, specifically) then yeah, the entire system is going down with you.

问题是一些程序被多个用户访问，因此使它们崩溃的成本不可忽视。或者有时它们对系统的运行至关重要（或者它们可能正在做一些非常关键的事情），在这种情况下，这可能会损坏您的数据。当然，如果您使记事本崩溃，那么没有人会在意，但是如果您使 CSRSS 崩溃（我相信它实际上有一种类似的错误——特别是双免费错误），那么是的，整个系统都会随着您而崩溃.

Update:

更新：

See this linkfor the CSRSS bug I was referring to.

请参阅此链接以了解我所指的 CSRSS 错误。

Edit:

编辑：

Take note that reading arbitrary data can be just as dangerousas executing arbitrary code! If you read a password, a cookie, etc. then it's just as serious as an arbitrary code execution -- and this is trivialif you just have enough time to try enough format strings.

请注意，读取任意数据与执行任意代码一样危险！如果您读取了密码、cookie 等，那么它就与任意代码执行一样严重——如果您有足够的时间尝试足够的格式字符串，这将是微不足道的。

C语言如何利用 Format-String 漏洞？

提问by Atul Goyal

回答by Michael Foukarakis

Reading from arbitrary memory addresses

从任意内存地址读取

Writing to arbitrary memory addresses

写入任意内存地址

回答by Jonathan Leffler

回答by user541686

Ah, the answer is in the article!

啊，答案在文章里！

回答by AndreyP

回答by user541686

Update:

更新：

Edit:

编辑：

相关推荐

最近更新

标签

C语言 如何利用 Format-String 漏洞？

提问by Atul Goyal

回答by Michael Foukarakis

Reading from arbitrary memory addresses

从任意内存地址读取

Writing to arbitrary memory addresses

写入任意内存地址

回答by Jonathan Leffler

回答by user541686

Ah, the answer is in the article!

啊，答案在文章里！

回答by AndreyP

回答by user541686

Update:

更新：

Edit:

编辑：

相关推荐

C语言 查找数组中的前 n 个最大元素

C语言 scanf时与char数组混淆

C语言 为什么 C 编译器将 long 指定为 32 位，将 long long 指定为 64 位？

C语言 使用指向 char 数组的指针，可以访问该数组中的值吗？

相关推荐

最近更新

标签

C语言如何利用 Format-String 漏洞？

C语言查找数组中的前 n 个最大元素

C语言为什么 C 编译器将 long 指定为 32 位，将 long long 指定为 64 位？

C语言使用指向 char 数组的指针，可以访问该数组中的值吗？