C语言 C - scanf() vs gets() vs fgets()

Question

提问by Marko

I've been doing a fairly easy program of converting a string of Characters (assuming numbers are entered) to an Integer.

我一直在做一个相当简单的程序，将一串字符（假设输入了数字）转换为一个整数。

After I was done, I noticed some very peculiar "bugs" that I can't answer, mostly because of my limited knowledge of how the scanf(), gets()and fgets()functions work. (I did read a lot of literature though.)

完成后，我注意到一些我无法回答的非常奇特的“错误”，主要是因为我对scanf(),gets()和fgets()函数如何工作的了解有限。（虽然我确实阅读了很多文献。）

So without writing too much text, here's the code of the program:

所以不用写太多文字，这里是程序的代码：

#include <stdio.h>

#define MAX 100

int CharToInt(const char *);

int main()
{
    char str[MAX];

    printf(" Enter some numbers (no spaces): ");
    gets(str);
//  fgets(str, sizeof(str), stdin);
//  scanf("%s", str);

    printf(" Entered number is: %d\n", CharToInt(str));

    return 0;
}

int CharToInt(const char *s)
{
    int i, result, temp;

    result = 0;
    i = 0;

    while(*(s+i) != 'n = x*10 = x*8 + x*2
')
    {
        temp = *(s+i) & 15;
        result = (temp + result) * 10;
        i++;
    }

    return result / 10;
}

So here's the problem I've been having. First, when using gets()function, the program works perfectly.

所以这就是我一直遇到的问题。首先，在使用gets()函数时，程序运行良好。

Second, when using fgets(), the result is slightly wrong because apparently fgets()function reads newline (ASCII value 10) character last which screws up the result.

其次，当使用时fgets()，结果略有错误，因为显然fgets()函数最后读取换行符（ASCII 值 10）字符，结果搞砸了。

Third, when using scanf()function, the result is completely wrong because first character apparently has a -52 ASCII value. For this, I have no explanation.

第三，当使用scanf()函数时，结果完全错误，因为第一个字符显然有-52 ASCII 值。对此，我没有任何解释。

Now I know that gets()is discouraged to use, so I would like to know if I can use fgets()here so it doesn't read (or ignores) newline character. Also, what's the deal with the scanf()function in this program?

现在我知道gets()不鼓励使用，所以我想知道我是否可以fgets()在这里使用，这样它就不会读取（或忽略）换行符。另外，scanf()这个程序中的函数是怎么回事？

Answer 1

回答by jamesdlin

Neveruse gets. It offers no protections against a buffer overflow vulnerability (that is, you cannot tell it how big the buffer you pass to it is, so it cannot prevent a user from entering a line larger than the buffer and clobbering memory).
Avoid using scanf. If not used carefully, it can have the same buffer overflow problems as gets. Even ignoring that, it has other problems that make it hard to use correctly.
Generally you should use fgetsinstead, although it's sometimes inconvenient (you have to strip the newline, you must determine a buffer size ahead of time, and then you must figure out what to do with lines that are too long–do you keep the part you read and discard the excess, discard the whole thing, dynamically grow the buffer and try again, etc.). There are some non-standard functions available that do this dynamic allocation for you (e.g. getlineon POSIX systems, Chuck Falconer's public domain ggetsfunction). Note that ggetshas gets-like semantics in that it strips a trailing newline for you.

永远不要使用gets. 它不提供针对缓冲区溢出漏洞的保护（也就是说，您无法告诉它传递给它的缓冲区有多大，因此它无法防止用户输入大于缓冲区的行并破坏内存）。
避免使用scanf. 如果不小心使用，它可能会出现与gets. 即使忽略这一点，它也存在其他问题，使其难以正确使用。
一般你应该fgets改用，虽然有时不方便（你必须去掉换行符，你必须提前确定缓冲区大小，然后你必须弄清楚如何处理太长的行——你保留你的部分吗？读取并丢弃多余的内容，丢弃整个内容，动态增加缓冲区并重试，等等）。有一些非标准函数可以为您执行这种动态分配（例如，getline在 POSIX 系统上，Chuck Falconer 的公共域ggets函数）。请注意，ggets具有gets-like 语义，因为它为您删除了尾随换行符。

Answer 2

回答by Jerry Coffin

Yes, you want to avoid gets. fgetswill always read the new-line if the buffer was big enough to hold it (which lets you know when the buffer was too small and there's more of the line waiting to be read). If you want something like fgetsthat won't read the new-line (losing that indication of a too-small buffer) you can use fscanfwith a scan-set conversion like: "%N[^\n]", where the 'N' is replaced by the buffer size - 1.

是的，你想避免gets. fgets如果缓冲区足够大以容纳它，则将始终读取换行符（这让您知道缓冲区何时太小并且有更多行等待读取）。如果你想要这样的东西fgets不会读取换行符（失去缓冲区太小的指示），你可以使用fscanf扫描集转换，例如："%N[^\n]"，其中“N”被缓冲区大小替换 - 1 .

One easy (if strange) way to remove the trailing new-line from a buffer after reading with fgetsis: strtok(buffer, "\n");This isn't how strtokis intended to be used, but I've used it this way more often than in the intended fashion (which I generally avoid).

读取后从缓冲区中删除尾随换行符的一种简单（如果奇怪）方法fgets是：strtok(buffer, "\n");这不是strtok打算如何使用，但我以这种方式使用它的频率比预期的方式（这我通常避免）。

Answer 3

回答by Michaelangel007

There are numerousproblems with this code. We'll fix the badly named variables and functions and investigate the problems:

这段代码有很多问题。我们将修复命名不当的变量和函数并调查问题：

First, CharToInt()should be renamed to the proper StringToInt()since it operates on an stringnot a single character.
The function CharToInt()[sic.] is unsafe. It doesn't check if the user accidentally passes in a NULL pointer.
It doesn't validate input, or more correctly, skip invalid input. If the user enters in a non-digit the result will contain a bogus value. i.e. If you enter in Nthe code *(s+i) & 15will produce 14 !?
Next, the nondescript tempin CharToInt()[sic.] should be called digitsince that is what it really is.
Also, the kludge return result / 10;is just that -- a bad hackto work around a buggy implementation.
Likewise MAXis badly named since it may appear to conflict with the standard usage. i.e. #define MAX(X,y) ((x)>(y))?(x):(y)
The verbose *(s+i)is not as readable as simply *s. There is no need to use and clutter up the code with yet another temporary index i.

首先，CharToInt()应该重命名为正确的，StringToInt()因为它对字符串而不是单个字符进行操作。
函数CharToInt()[原文如此] 是不安全的。它不会检查用户是否不小心传入了 NULL 指针。
它不验证输入，或者更准确地说，跳过无效输入。如果用户输入非数字，则结果将包含虚假值。即如果您输入N代码*(s+i) & 15将产生 14 ！？
接下来，应该调用 [sic.] 中的非描述性内容temp，因为它确实是这样。CharToInt()digit
此外，杂牌return result / 10;就是这样-一个糟糕的黑客来解决一个越野车的实现。
同样MAX的名字很糟糕，因为它可能看起来与标准用法相冲突。IE#define MAX(X,y) ((x)>(y))?(x):(y)
冗长*(s+i)不像简单那样可读*s。没有必要使用另一个临时索引来使用和弄乱代码i。

gets()

获取（）

This is bad because it can overflow the input string buffer. For example, if the buffer size is 2, and you enter in 16 characters, you will overflow str.

这很糟糕，因为它会溢出输入字符串缓冲区。例如，如果缓冲区大小为 2，并且您输入 16 个字符，则会溢出str。

scanf()

扫描（）

This is equally bad because it can overflow the input string buffer.

这同样糟糕，因为它可能会溢出输入字符串缓冲区。

You mention "when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value."

您提到“使用 scanf() 函数时，结果完全错误，因为第一个字符显然具有 -52 ASCII 值。”

That is due to an incorrect usage of scanf(). I was not able to duplicate this bug.

这是由于 scanf() 的错误使用造成的。我无法复制此错误。

fgets()

This is safe because you can guarantee you never overflow the input string buffer by passing in the buffer size (which includes room for the NULL.)

这是安全的，因为您可以通过传入缓冲区大小（包括 NULL 的空间）来保证您永远不会溢出输入字符串缓冲区。

getline()

获取行（）

A few people have suggested the C POSIX standardgetline()as a replacement. Unfortunately this is not a practical portable solution as Microsoft does not implement a C version; only the standard C++ string template functionas this SO #27755191question answers. Microsoft's C++ getline()was available at least far back as Visual Studio 6but since the OP is strictly asking about C and not C++ this isn't an option.

一些人建议使用 C POSIX 标准getline()作为替代。不幸的是，这不是一个实用的便携式解决方案，因为 Microsoft 没有实现 C 版本；只有标准的 C++字符串模板函数作为这个 SO #27755191问题的答案。微软的 C++getline()至少可以追溯到Visual Studio 6，但由于 OP 严格询问 C 而不是 C++，这不是一个选择。

Misc.

杂项

Lastly, this implementation is buggy in that it doesn't detect integer overflow. If the user enters too large a number the number may become negative! i.e. 9876543210will become -18815698?! Let's fix that too.

最后，这个实现是有问题的，因为它没有检测整数溢出。如果用户输入的数字太大，数字可能会变成负数！即9876543210会成为-18815698？！让我们也解决这个问题。

This is trivial to fix for an unsigned int. If the previous partial number is less then the current partial number then we have overflowed and we return the previous partial number.

修复unsigned int. 如果前一个部分数小于当前部分数，那么我们已经溢出并返回前一个部分数。

For a signed intthis is a little more work. In assembly we could inspect the carry-flag, but in C there is no standard built-in way to detect overflow with signed int math. Fortunately, since we are multiplying by a constant, * 10, we can easily detect this if we use an equivalent equation:

对于 asigned int这需要更多的工作。在汇编中我们可以检查进位标志，但在 C 中没有标准的内置方法来检测带符号 int 数学溢出。幸运的是，由于我们乘以常数，* 10如果我们使用等价方程，我们可以很容易地检测到这一点：

#include <stdio.h>
#include <ctype.h> // isdigit()

// 1 fgets
// 2 gets
// 3 scanf
#define INPUT 1

#define SIGNED 1

// re-implementation of atoi()
// Test Case: 2147483647 -- valid    32-bit
// Test Case: 2147483648 -- overflow 32-bit
int StringToInt( const char * s )
{
    int result = 0, prev, msb = (sizeof(int)*8)-1, overflow;

    if( !s )
        return result;

    while( *s )
    {
        if( isdigit( *s ) ) // Alt.: if ((*s >= '0') && (*s <= '9'))
        {
            prev     = result;
            overflow = result >> (msb-2); // test if top 3 MSBs will overflow on x*8
            result  *= 10;
            result  += *s++ & 0xF;// OPTIMIZATION: *s - '0'

            if( (result < prev) || overflow ) // check if would overflow
                return prev;
        }
        else
            break; // you decide SKIP or BREAK on invalid digits
    }

    return result;
}

// Test case: 4294967295 -- valid    32-bit
// Test case: 4294967296 -- overflow 32-bit
unsigned int StringToUnsignedInt( const char * s )
{
    unsigned int result = 0, prev;

    if( !s )
        return result;

    while( *s )
    {
        if( isdigit( *s ) ) // Alt.: if (*s >= '0' && *s <= '9')
        {
            prev    = result;
            result *= 10;
            result += *s++ & 0xF; // OPTIMIZATION: += (*s - '0')

            if( result < prev ) // check if would overflow
                return prev;
        }
        else
            break; // you decide SKIP or BREAK on invalid digits
    }

    return result;
}

int main()
{
    int  detect_buffer_overrun = 0;

    #define   BUFFER_SIZE 2    // set to small size to easily test overflow
    char str[ BUFFER_SIZE+1 ]; // C idiom is to reserve space for the NULL terminator

    printf(" Enter some numbers (no spaces): ");

#if   INPUT == 1
    fgets(str, sizeof(str), stdin);
#elif INPUT == 2
    gets(str); // can overflows
#elif INPUT == 3
    scanf("%s", str); // can also overflow
#endif

#if SIGNED
    printf(" Entered number is: %d\n", StringToInt(str));
#else
    printf(" Entered number is: %u\n", StringToUnsignedInt(str) );
#endif
    if( detect_buffer_overrun )
        printf( "Input buffer overflow!\n" );

    return 0;
}

If x*8 overflows then logically x*10 will as well. For a 32-bit int overflow will happen when x*8 = 0x100000000 thus all we need to do is detect when x >= 0x20000000. Since we don't want to assume how many bits an inthas we only need to test if the top 3 msb's (Most Significant Bits) are set.

如果 x*8 溢出，那么逻辑上 x*10 也会溢出。对于 32 位 int 溢出将在 x*8 = 0x100000000 时发生，因此我们需要做的就是检测何时 x >= 0x20000000。由于我们不想假设有多少位，int我们只需要测试是否设置了前 3 个 msb（最高有效位）。

Additionally, a second overflow test is needed. If the msb is set (sign bit) after the digit concatenation then we also know the number overflowed.

此外，还需要进行第二次溢出测试。如果在数字连接之后设置了 msb（符号位），那么我们也知道数字溢出了。

Code

代码

Here is a fixed safe version along with code that you can play with to detect overflow in the unsafe versions. I've also included both a signedand unsignedversions via #define SIGNED 1

这是一个固定的安全版本以及您可以用来检测不安全版本中的溢出的代码。我还包含一个都signed和unsigned通过版本#define SIGNED 1

char *result = fgets(str, sizeof(str), stdin);
char len = strlen(str);
if(result != NULL && str[len - 1] == '\n')
{
  str[len - 1] = 'char *line = NULL;
size_t len = 0;
ssize_t count = getline(&line, &len, stdin);
if(count >= 1 && line[count - 1] == '\n')
{
  line[count - 1] = 'int CharToInt(const char *s)
{
    int i, result, temp;

    result = 0;
    i = 0;

    while(*(s+i) != 'char str[MAX];
printf("Enter some text: ");
scanf("%s", &str);
fflush(stdin);
')
    {
        if (isdigit(*(s+i)))
        {
            temp = *(s+i) & 15;
            result = (temp + result) * 10;
        }
        i++;
    }

    return result / 10;
}
';
}
else
{
  // Handle error
}
';
}
else
{
  // handle error
}

Answer 4

回答by Matthew Flaschen

You're correct that you should never use gets. If you want to use fgets, you can simply overwrite the newline.

你是对的，你永远不应该使用gets. 如果您想使用fgets，您可以简单地覆盖换行符。

##代码##

This does assume there are no embedded NULLs. Another option is POSIX getline:

这确实假设没有嵌入的 NULL。另一种选择是 POSIX getline：

##代码##

The advantage to getlineis it does allocation and reallocation for you, it handles possible embedded NULLs, and it returns the count so you don't have to waste time with strlen. Note that you can't use an array with getline. The pointer must be NULLor free-able.

优点getline是它为您进行分配和重新分配，它处理可能的嵌入 NULL，并返回计数，因此您不必浪费时间使用strlen. 请注意，您不能将数组与getline. 指针必须是NULL或可自由的。

I'm not sure what issue you're having with scanf.

我不确定你遇到了什么问题scanf。

Answer 5

回答by Peter Miehle

never use gets(), it can lead to unprdictable overflows. If your string array is of size 1000 and i enter 1001 characters, i can buffer overflow your program.

永远不要使用gets()，它会导致不可预测的溢出。如果您的字符串数组的大小为 1000 并且我输入了 1001 个字符，则我可以缓冲区溢出您的程序。

Answer 6

回答by Amardeep AC9MF

Try using fgets() with this modified version of your CharToInt():

尝试将 fgets() 与 CharToInt() 的这个修改版本一起使用：

##代码##

It essentially validates the input digits and ignores anything else. This is very crude so modify it and salt to taste.

它本质上验证输入数字并忽略其他任何内容。这很粗糙，所以修改它并加盐调味。

Answer 7

回答by nicolas gasser

So I am not much of a programmer but let me try to answer your question about the scanf();. I think the scanf is pretty fine and use it for mostly everything without having any issues. But you have taken a not completely correct structure. It should be:

所以我不是一个程序员，但让我试着回答你关于scanf();. 我认为 scanf 非常好，几乎可以将它用于所有事情而没有任何问题。但是您采用了不完全正确的结构。它应该是：

##代码##

The "&" in front of the variable is important. It tells the program where (in which variable) to save the scanned value. the fflush(stdin);clears the buffer from the standard input (keyboard) so you're less likely to get a buffer overflow.

变量前面的“&”很重要。它告诉程序在哪里（在哪个变量中）保存扫描的值。在fflush(stdin);清除从标准输入（键盘）的缓冲区，这样你就不太可能得到一个缓冲区溢出。

And the difference between gets/scanf and fgets is that gets();and scanf();only scan until the first space ' 'while fgets();scans the whole input. (but be sure to clean the buffer afterwards so you wont get an overflow later on)

之间获取/ scanf函数和与fgets不同的是，gets();和scanf();只扫描到第一空间' '，同时fgets();扫描整个输入。（但一定要在之后清理缓冲区，这样以后就不会溢出了）

C语言 C - scanf() vs gets() vs fgets()

提问by Marko

回答by jamesdlin

回答by Jerry Coffin

回答by Michaelangel007

gets()

获取（）

scanf()

扫描（）

fgets()

fgets()

getline()

获取行（）

Misc.

杂项

Code

代码

回答by Matthew Flaschen

回答by Peter Miehle

回答by Amardeep AC9MF

回答by nicolas gasser

相关推荐

最近更新

标签

C语言 C - scanf() vs gets() vs fgets()

提问by Marko

回答by jamesdlin

回答by Jerry Coffin

回答by Michaelangel007

gets()

获取（）

scanf()

扫描（）

fgets()

fgets()

getline()

获取行（）

Misc.

杂项

Code

代码

回答by Matthew Flaschen

回答by Peter Miehle

回答by Amardeep AC9MF

回答by nicolas gasser

相关推荐

C语言 C中的二进制文件读/写

C语言 什么是换行符——'\n'

C语言 C fopen 写入失败，errno 为 2

C语言 使用 fopen 的附加模式覆盖文件

相关推荐

最近更新

标签

C语言什么是换行符——'\n'

C语言使用 fopen 的附加模式覆盖文件