C语言 scanf的缺点
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2430303/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Disadvantages of scanf
提问by karthi_ms
I want to know the disadvantages of scanf().
我想知道scanf().
In many sites, I have read that using scanfmight cause buffer overflows. What is the reason for this? Are there any other drawbacks with scanf?
在许多站点中,我读到使用scanf可能会导致缓冲区溢出。这是什么原因?还有其他缺点scanf吗?
采纳答案by paxdiablo
The problems with scanf are (at a minimum):
scanf 的问题是(至少):
- using
%sto get a string from the user, which leads to the possibility that the string may be longer than your buffer, causing overflow. - the possibility of a failed scan leaving your file pointer in an indeterminate location.
- 使用
%s以从用户那里获取,从而导致的可能性,该字符串可能会超过你的缓冲区,引起溢出的字符串。 - 扫描失败将文件指针留在不确定位置的可能性。
I very much prefer using fgetsto read whole lines in so that you can limit the amount of data read. If you've got a 1K buffer, and you read a line into it with fgetsyou can tell if the line was too long by the fact there's no terminating newline character (last line of a file without a newline notwithstanding).
我非常喜欢使用fgets读取整行,以便您可以限制读取的数据量。如果您有一个 1K 缓冲区,并且您将一行读入其中,fgets您可以通过没有终止换行符的事实来判断该行是否太长(尽管文件的最后一行没有换行符)。
Then you can complain to the user, or allocate more space for the rest of the line (continuously if necessary until you have enough space). In either case, there's no risk of buffer overflow.
然后你可以向用户抱怨,或者为行的其余部分分配更多的空间(如果有必要,继续,直到你有足够的空间)。无论哪种情况,都不存在缓冲区溢出的风险。
Once you've read the line in, you knowthat you're positioned at the next line so there's no problem there. You can then sscanfyour string to your heart's content without having to save and restore the file pointer for re-reading.
一旦你读入了这一行,你就会知道你位于下一行,所以那里没有问题。然后您就可以将sscanf您的字符串变成您心仪的内容,而无需保存和恢复文件指针以供重新阅读。
Here's a snippet of code which I frequently use to ensure no buffer overflow when asking the user for information.
这是我经常用来确保在询问用户信息时没有缓冲区溢出的一段代码。
It could be easily adjusted to use a file other than standard input if necessary and you could also have it allocate its own buffer (and keep increasing it until it's big enough) before giving that back to the caller (although the caller would then be responsible for freeing it, of course).
如有必要,它可以很容易地调整为使用标准输入以外的文件,您也可以让它分配自己的缓冲区(并不断增加它直到它足够大),然后再将其返回给调用者(尽管调用者将负责)当然是为了释放它)。
#include <stdio.h>
#include <string.h>
#define OK 0
#define NO_INPUT 1
#define TOO_LONG 2
#define SMALL_BUFF 3
static int getLine (char *prmpt, char *buff, size_t sz) {
int ch, extra;
// Size zero or one cannot store enough, so don't even
// try - we need space for at least newline and terminator.
if (sz < 2)
return SMALL_BUFF;
// Output prompt.
if (prmpt != NULL) {
printf ("%s", prmpt);
fflush (stdout);
}
// Get line with buffer overrun protection.
if (fgets (buff, sz, stdin) == NULL)
return NO_INPUT;
// If it was too long, there'll be no newline. In that case, we flush
// to end of line so that excess doesn't affect the next call.
size_t lastPos = strlen(buff) - 1;
if (buff[lastPos] != '\n') {
extra = 0;
while (((ch = getchar()) != '\n') && (ch != EOF))
extra = 1;
return (extra == 1) ? TOO_LONG : OK;
}
// Otherwise remove newline and give string back to caller.
buff[lastPos] = '// Test program for getLine().
int main (void) {
int rc;
char buff[10];
rc = getLine ("Enter string> ", buff, sizeof(buff));
if (rc == NO_INPUT) {
// Extra NL since my system doesn't output that on EOF.
printf ("\nNo input\n");
return 1;
}
if (rc == TOO_LONG) {
printf ("Input too long [%s]\n", buff);
return 1;
}
printf ("OK [%s]\n", buff);
return 0;
}
';
return OK;
}
And, a test driver for it:
而且,它的测试驱动程序:
$ ./tstprg
Enter string>[CTRL-D]
No input
$ ./tstprg
Enter string> a
OK [a]
$ ./tstprg
Enter string> hello
OK [hello]
$ ./tstprg
Enter string> hello there
Input too long [hello the]
$ ./tstprg
Enter string> i am pax
OK [i am pax]
Finally, a test run to show it in action:
最后,测试运行以显示它的实际效果:
scanf("%10[^\n]%*[^\n]", buf));
getchar();
回答by AnT
Most of the answers so far seem to focus on the string buffer overflow issue. In reality, the format specifiers that can be used with scanffunctions support explicit field widthsetting, which limit the maximum size of the input and prevent buffer overflow. This renders the popular accusations of string-buffer overflow dangers present in scanfvirtually baseless. Claiming that scanfis somehow analogous to getsin the respect is completely incorrect. There's a major qualitative difference between scanfand gets: scanfdoes provide the user with string-buffer-overflow-preventing features, while getsdoesn't.
到目前为止,大多数答案似乎都集中在字符串缓冲区溢出问题上。实际上,可与scanf函数一起使用的格式说明符支持显式字段宽度设置,这限制了输入的最大大小并防止缓冲区溢出。这使得对字符串缓冲区溢出危险的流行指责scanf几乎毫无根据。声称这scanf在某种程度上类似于gets在这方面是完全错误的。scanf和之间有一个主要的质的区别gets:scanf确实为用户提供了防止字符串缓冲区溢出的功能,而gets没有。
One can argue that these scanffeatures are difficult to use, since the field width has to be embedded into format string (there's no way to pass it through a variadic argument, as it can be done in printf). That is actually true. scanfis indeed rather poorly designed in that regard. But nevertheless any claims that scanfis somehow hopelessly broken with regard to string-buffer-overflow safety are completely bogus and usually made by lazy programmers.
有人可能会争辩说这些scanf功能很难使用,因为字段宽度必须嵌入到格式字符串中(无法通过可变参数传递它,因为它可以在 中完成printf)。这实际上是真的。scanf在这方面确实设计得很差。但是无论如何,任何scanf关于字符串缓冲区溢出安全的声明都以某种方式无可救药地被打破,这完全是虚假的,通常是由懒惰的程序员提出的。
The real problem with scanfhas a completely different nature, even though it is also about overflow. When scanffunction is used for converting decimal representations of numbers into values of arithmetic types, it provides no protection from arithmetic overflow. If overflow happens, scanfproduces undefined behavior. For this reason, the only proper way to perform the conversion in C standard library is functions from strto...family.
真正的问题scanf具有完全不同的性质,即使它也是关于overflow 的。当scanf函数用于将数字的十进制表示转换为算术类型的值时,它不提供算术溢出保护。如果发生溢出,scanf则会产生未定义的行为。因此,在 C 标准库中执行转换的唯一正确方法是strto...家族中的函数。
So, to summarize the above, the problem with scanfis that it is difficult (albeit possible) to use properly and safely with string buffers. And it is impossible to use safely for arithmetic input. The latter is the real problem. The former is just an inconvenience.
因此,总结上述内容,问题scanf在于很难(尽管可能)正确且安全地使用字符串缓冲区。并且不可能安全地用于算术输入。后者才是真正的问题。前者只是一个不便。
P.S. The above in intended to be about the entire family of scanffunctions (including also fscanfand sscanf). With scanfspecifically, the obvious issue is that the very idea of using a strictly-formatted function for reading potentially interactiveinput is rather questionable.
PS 以上旨在介绍整个scanf函数系列(也包括fscanf和sscanf)。有了scanf明确,明显的问题是,使用严格的格式化功能读取潜在的想法交互式输入是相当可疑的。
回答by jamesdlin
From the comp.lang.c FAQ: Why does everyone say not to use scanf? What should I use instead?
来自comp.lang.c FAQ:为什么大家都说不要用scanf?我应该用什么代替?
scanfhas a number of problems—see questions 12.17, 12.18a, and 12.19. Also, its%sformat has the same problem thatgets()has (see question 12.23)—it's hard to guarantee that the receiving buffer won't overflow. [footnote]More generally,
scanfis designed for relatively structured, formatted input (its name is in fact derived from “scan formatted”). If you pay attention, it will tell you whether it succeeded or failed, but it can tell you only approximately where it failed, and not at all how or why. You have very little opportunity to do any error recovery.Yet interactive user input is the least structured input there is. A well-designed user interface will allow for the possibility of the user typing just about anything—not just letters or punctuation when digits were expected, but also more or fewer characters than were expected, or no characters at all (i.e., just the RETURN key), or premature EOF, or anything. It's nearly impossible to deal gracefully with all of these potential problems when using
scanf; it's far easier to read entire lines (withfgetsor the like), then interpret them, either usingsscanfor some other techniques. (Functions likestrtol,strtok, andatoiare often useful; see also questions 12.16and 13.6.) If you do use anyscanfvariant, be sure to check the return value to make sure that the expected number of items were found. Also, if you use%s, be sure to guard against buffer overflow.Note, by the way, that criticisms of
scanfare not necessarily indictments offscanfandsscanf.scanfreads fromstdin, which is usually an interactive keyboard and is therefore the least constrained, leading to the most problems. When a data file has a known format, on the other hand, it may be appropriate to read it withfscanf. It's perfectly appropriate to parse strings withsscanf(as long as the return value is checked), because it's so easy to regain control, restart the scan, discard the input if it didn't match, etc.Additional links:
References: K&R2 Sec. 7.4 p. 159
scanf有一些问题,参见问题12.17,12.18a和12.19。此外,它的%s格式也有同样的问题gets()(参见问题12.23)——很难保证接收缓冲区不会溢出。[脚注]更一般地说,
scanf是为相对结构化、格式化的输入而设计的(它的名字实际上来源于“扫描格式化”)。如果您注意,它会告诉您它是成功还是失败,但它只能告诉您失败的大致位置,而根本无法告诉您失败的原因或原因。您几乎没有机会进行任何错误恢复。然而,交互式用户输入是结构化程度最低的输入。一个设计良好的用户界面将允许用户输入几乎任何东西——不仅仅是在预期数字时输入字母或标点符号,还包括比预期更多或更少的字符,或者根本没有字符(即,只是 RETURN键),或过早的 EOF,或任何东西。使用 时,几乎不可能优雅地处理所有这些潜在问题
scanf;阅读整行(使用fgets或类似的),然后使用sscanf或其他一些技术来解释它们要容易得多。(像strtol、strtok和 之类的函数atoi通常很有用;另请参阅问题12.16和13.6。)如果您确实使用了任何scanf变体,请务必检查返回值以确保找到预期数量的项目。此外,如果您使用%s,请务必防止缓冲区溢出。顺便说一下,请注意,对 的批评
scanf不一定是对fscanf和 的控诉sscanf。scanf从 读取stdin,它通常是一个交互式键盘,因此受到的限制最少,导致最多的问题。另一方面,当数据文件具有已知格式时,使用fscanf. 用sscanf(只要检查返回值)解析字符串非常合适,因为它很容易重新获得控制权、重新启动扫描、如果不匹配则丢弃输入等。附加链接:
参考资料:K&R2 Sec。7.4 页 159
回答by Alok Singhal
It is very hard to get scanfto do the thing you want. Sure, you can, but things like scanf("%s", buf);are as dangerous as gets(buf);, as everyone has said.
scanf去做你想做的事情是非常困难的。当然,你可以,但就像每个人都说的scanf("%s", buf);那样危险gets(buf);。
As an example, what paxdiablo is doing in his function to read can be done with something like:
例如,paxdiablo 在他的读取函数中所做的事情可以通过以下方式完成:
#include <stdio.h>
enum read_status {
OK,
NO_INPUT,
TOO_LONG
};
static int get_line(const char *prompt, char *buf, size_t sz)
{
char fmt[40];
int i;
int nscanned;
printf("%s", prompt);
fflush(stdout);
sprintf(fmt, "%%%zu[^\n]%%*[^\n]%%n", sz-1);
/* read at most sz-1 characters on, discarding the rest */
i = scanf(fmt, buf, &nscanned);
if (i > 0) {
getchar();
if (nscanned >= sz) {
return TOO_LONG;
} else {
return OK;
}
} else {
return NO_INPUT;
}
}
int main(void)
{
char buf[10+1];
int rc;
while ((rc = get_line("Enter string> ", buf, sizeof buf)) != NO_INPUT) {
if (rc == TOO_LONG) {
printf("Input too long: ");
}
printf("->%s<-\n", buf);
}
return 0;
}
The above will read a line, store the first 10 non-newline characters in buf, and then discard everything till (and including) a newline. So, paxdiablo's function could be written using scanfthe following way:
上面将读取一行,将前 10 个非换行符存储在 中buf,然后丢弃所有内容直到(并包括)换行符。因此,paxdiablo 的函数可以使用scanf以下方式编写:
int i;
scanf("%d", &i);
One of the other problems with scanfis its behavior in case of overflow. For example, when reading an int:
其他问题之一scanf是它在溢出时的行为。例如,当阅读int:
char buf[3];
sscanf("abcdef","%s",buf);
the above cannot be used safely in case of an overflow. Even for the first case, reading a string is much more simpler to do with fgetsrather than with scanf.
在溢出的情况下不能安全地使用上述内容。即使是第一种情况下,读书的字符串是更简单的与做的fgets,而不是用scanf。
回答by codaddict
Yes, you are right. There is a major security flaw in scanffamily(scanf,sscanf, fscanf..etc) esp when reading a string, because they don't take the length of the buffer (into which they are reading) into account.
你是对的。读取字符串时,scanffamily( scanf, sscanf, fscanf..etc) esp存在一个主要的安全漏洞,因为它们没有考虑缓冲区(它们正在读取的)的长度。
Example:
例子:
char *buf;
scanf("%ms", &buf); // with 'm', scanf expects a pointer to pointer to char.
// use buf
free(buf);
clearly the the buffer bufcan hold MAX 3char. But the sscanfwill try to put "abcdef"into it causing buffer overflow.
显然缓冲区buf可以容纳 MAX 个3字符。但是sscanf将尝试放入"abcdef"它导致缓冲区溢出。
回答by John Bode
Problems I have with the *scanf()family:
我与*scanf()家人的问题:
- Potential for buffer overflow with %s and %[ conversion specifiers. Yes, you can specify a maximum field width, but unlike with
printf(), you can't make it an argument in thescanf()call; it must be hardcoded in the conversion specifier. - Potential for arithmetic overflow with %d, %i, etc.
- Limited ability to detect and reject badly formed input. For example, "12w4" is not a valid integer, but
scanf("%d", &value);will successfully convert and assign 12 tovalue, leaving the "w4" stuck in the input stream to foul up a future read. Ideally the entire input string should be rejected, butscanf()doesn't give you an easy mechanism to do that.
- %s 和 %[ 转换说明符可能导致缓冲区溢出。是的,您可以指定最大字段宽度,但与 with 不同
printf(),您不能在scanf()调用中将其作为参数;它必须在转换说明符中进行硬编码。 - %d、%i 等算术溢出的可能性。
- 检测和拒绝格式错误的输入的能力有限。例如,“12w4”不是一个有效的整数,但
scanf("%d", &value);会成功地将 12 转换并分配给value,从而使“w4”卡在输入流中以阻碍未来的读取。理想情况下,整个输入字符串都应该被拒绝,但scanf()并没有给你一个简单的机制来做到这一点。
If you know your input is always going to be well-formed with fixed-length strings and numerical values that don't flirt with overflow, then scanf()is a great tool. If you're dealing with interactive input or input that isn't guaranteed to be well-formed, then use something else.
如果您知道您的输入总是会使用固定长度的字符串和不会与溢出调情的数值格式良好,那么这scanf()是一个很好的工具。如果您正在处理交互式输入或不能保证格式良好的输入,请使用其他方法。
回答by dreamlax
Many answers here discuss the potential overflow issues of using scanf("%s", buf), but the latest POSIX specification more-or-less resolves this issue by providing an massignment-allocation character that can be used in format specifiers for c, s, and [formats. This will allow scanfto allocate as much memory as necessary with malloc(so it must be freed later with free).
这里的许多答案,讨论使用的潜在溢出的问题scanf("%s", buf),但最新的POSIX规范更多或更少的通过提供解决此问题m,可以在格式说明可用于分配,分配的角色c,s和[格式。这将允许scanf根据需要分配尽可能多的内存malloc(因此稍后必须使用 释放它free)。
An example of its use:
其使用示例:
int i;
scanf("%10s", &i);
See here. Disadvantages to this approach is that it is a relatively recent addition to the POSIX specification and it is not specified in the C specification at all, so it remains rather unportable for now.
见这里。这种方法的缺点是它是 POSIX 规范中相对较新的补充,并且在 C 规范中根本没有指定,因此目前它仍然相当不可移植。
回答by Vladimir Veljkovic
There is one big problem with scanf-like functions - the lack of anytype safety. That is, you can code this:
类似scanf函数存在一个大问题——缺乏任何类型安全性。也就是说,你可以这样编码:
scanf("%10s", i);
Hell, even this is "fine":
地狱,即使这是“很好”:
##代码##It's worse than printf-like functions, because scanfexpects a pointer, so crashes are more likely.
它比printf-like 函数更糟糕,因为scanf需要一个指针,所以更有可能发生崩溃。
Sure, there are some format-specifier checkers out there, but, those are not perfect and well, they are not part of the language or the standard library.
当然,那里有一些格式说明符检查器,但是,它们并不完美,它们不是语言或标准库的一部分。
回答by autistic
The advantage of scanfis once you learn how use the tool, as you should always do in C, it has immensely useful usecases.You can learn how to use scanfand friends by reading and understanding the manual. If you can't get through that manual without serious comprehension issues, this would probably indicate that you don't know C very well.
的好处scanf是一旦你学会了如何使用这个工具,就像你在 C 中应该经常做的那样,它有非常有用的用例。您可以scanf通过阅读和理解手册来学习如何使用和朋友。如果您在没有严重理解问题的情况下无法阅读该手册,则这可能表明您不太了解 C。
scanfand friends suffered from unfortunate design choicesthat rendered it difficult (and occasionally impossible) to use correctly without reading the documentation, as other answers have shown. This occurs throughout C, unfortunately, so if I were to advise against using scanfthen I would probably advise against using C.
scanf和朋友们遭受了不幸的设计选择的困扰,这使得在不阅读文档的情况下很难(有时甚至不可能)正确使用,正如其他答案所示。不幸的是,这在整个 C 中都会发生,所以如果我建议不要使用,scanf那么我可能会建议不要使用 C。
One of the biggest disadvantages seems to be purely the reputation it's earned amongst the uninitiated; as with many useful features of C we should be well informed before we use it. The key is to realise that as with the rest of C, it seems succinct and idiomatic, but that can be subtly misleading. This is pervasive in C; it's easy for beginners to write code that they think makes sense and might even work for them initially, but doesn't make sense and can fail catastrophically.
最大的缺点之一似乎纯粹是它在外行中赢得的声誉;与 C 的许多有用特性一样,我们在使用它之前应该充分了解它。关键是要意识到,与 C 的其余部分一样,它看起来简洁而惯用,但这可能会产生微妙的误导。这在 C 中很普遍;初学者很容易编写他们认为有意义的代码,甚至最初可能对他们有用,但没有意义并且可能会导致灾难性的失败。
For example, the uninitiated commonly expect that the %sdelegate would cause a lineto be read, and while that might seem intuitive it isn't necessarily true. It's more appropriate to describe the field read as a word. Reading the manual is strongly advised for every function.
例如,外行通常期望%s委托会导致读取一行,虽然这看起来很直观,但不一定是真的。将字段描述为一个词更合适。强烈建议对每个功能都阅读手册。
What would any response to this question be without mentioning its lack of safety and risk of buffer overflows? As we've already covered, C isn't a safe language, and will allow us to cut corners, possibly to apply an optimisation at the expense of correctness or more likely because we're lazy programmers. Thus, when we know the system will never receive a string larger than a fixed number of bytes, we're given the ability to declare an array that size and forego bounds checking. I don't really see this as a down-fall; it's an option. Again, reading the manual is strongly advised and would reveal this option to us.
如果不提及它缺乏安全性和缓冲区溢出的风险,对这个问题的任何回应会是什么?正如我们已经提到的,C 不是一种安全的语言,它允许我们走捷径,可能以牺牲正确性为代价来应用优化,或者更有可能因为我们是懒惰的程序员。因此,当我们知道系统永远不会收到大于固定字节数的字符串时,我们就可以声明一个数组,该数组的大小并放弃边界检查。我并不认为这是一种堕落。这是一个选择。同样,强烈建议阅读手册,并会向我们揭示此选项。
Lazy programmers aren't the only ones stung by scanf. It's not uncommon to see people trying to read floator doublevalues using %d, for example. They're usually mistaken in believing that the implementation will perform some kind of conversion behind the scenes, which would make sense because similar conversions happen throughout the rest of the language, but that's not the case here. As I said earlier, scanfand friends (and indeed the rest of C) are deceptive; they seem succinct and idiomatic but they aren't.
被scanf. 例如,看到人们尝试阅读float或double使用值的情况并不少见%d。他们通常错误地认为实现会在幕后执行某种转换,这是有道理的,因为类似的转换发生在语言的其余部分,但事实并非如此。正如我之前所说的,scanf朋友(以及 C 的其他部分)都是骗人的;它们看起来简洁而地道,但事实并非如此。
Inexperienced programmers aren't forced to consider the success of the operation. Suppose the user enters something entirely non-numeric when we've told scanfto read and convert a sequence of decimal digits using %d. The only way we can intercept such erroneous data is to check the return value, and how often do we bother checking the return value?
没有经验的程序员不会被迫考虑操作的成功。假设当我们告诉scanf用户使用%d. 我们拦截这种错误数据的唯一方法是检查返回值,我们多久检查一次返回值?
Much like fgets, when scanfand friends fail to read what they're told to read, the stream will be left in an unusual state;- In the case of fgets, if there isn't sufficient space to store a complete line, then the remainder of the line left unread might be erroneously treated as though it's a new line when it isn't.
- In the case of scanfand friends, a conversion failed as documented above, the erroneous data is left unread on the stream and might be erroneously treated as though it's part of a different field.
就像fgets,当scanf朋友无法阅读他们被告知要阅读的内容时,流将处于异常状态;- 在 的情况下fgets,如果没有足够的空间来存储完整的行,那么未读的行的其余部分可能会被错误地视为新行,而实际上它不是。- 在scanf和朋友的情况下,转换失败,如上文所述,错误数据未在流中读取,并且可能被错误地视为不同字段的一部分。
It's no easier to use scanfand friends than to use fgets. If we check for success by looking for a '\n'when we're using fgetsor by inspecting the return value when we use scanfand friends, and we find that we've read an incomplete line using fgetsor failed to read a field using scanf, then we're faced with the same reality: We're likely to discard input(usually up until and including the next newline)! Yuuuuuuck!
使用scanf和朋友并不比使用更容易fgets。如果我们通过'\n'在使用时查找 afgets或通过检查使用scanf和朋友时的返回值来检查是否成功,并且我们发现使用 读取了不完整的行fgets或无法读取字段 using scanf,那么我们是面临同样的现实:我们可能会丢弃输入(通常直到并包括下一个换行符)!呜呜呜!
Unfortunately, scanfboth simultaneously makes it hard (non-intuitive) and easy (fewest keystrokes) to discard input in this way. Faced with this reality of discarding user input, some have tried , not realising that the scanf("%*[^\n]%*c");%*[^\n]delegate will fail when it encounters nothing but a newline, and hence the newline will still be left on the stream.
不幸的是,scanf两者同时使以这种方式丢弃输入变得困难(非直观)和容易(最少的击键)。面对这种丢弃用户输入的现实,有些人已经尝试过,但没有意识到scanf("%*[^\n]%*c");%*[^\n]当它遇到一个换行符时委托会失败,因此换行符仍然会留在流中。
A slight adaptation, by separating the two format delegates and we see some success here: scanf("%*[^\n]"); getchar();. Try doing that with so few keystrokes using some other tool ;)
通过将两种格式代表分开,稍微调整一下,我们在这里看到了一些成功:scanf("%*[^\n]"); getchar();. 尝试使用其他工具通过很少的按键来做到这一点;)

