C语言 使用 fscanf() 读取每行 3 个数字的文件,为什么“%d%d%d%*c”的表现与“%d%d%d”一样好?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16557997/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using fscanf() to read a file with lines of 3 numbers each,why does "%d%d%d%*c" act as good as "%d%d%d"?
提问by Rüppell's Vulture
I know that the %dformat specifier,when used here in fscanf(), reads an integer and ignores the white-space preceding it,including the newline(I verified it).But in my following program that uses fscanf()to read from a file of multiple lines with 3 integers each,the format string "%d%d%d%*c"works as good as "%d%d%d".
我知道%d格式说明符,当在这里使用时fscanf(),读取一个整数并忽略它前面的空格,包括换行符(我验证了它)。但是在我下面的程序中,它fscanf()用于从多行文件中读取 3每个整数,格式字符串的"%d%d%d%*c"效果与"%d%d%d".
Why is it so?Since fscanf()used with %das the first format specifier in the format specifier string ignores any whitespace preceding an integer, why doesn't the extra %*cused as last specifier cause any error or side-effect?Had the %dspecifier not been ignoring the newline after each group of 3 numbers in a line,then %*cwould have make sense as it would eat away the newline.But why it works without error or side-effect even if fscanf()ignores whitespace for %dby default? Shouldn't fscanf() stop scanning when %*c can't find a character to eat and there is a mismatch between the specifier and the input? Isn't fscanf()supposed to stop when there is a mismatch,just as scanf()does?
为什么会这样呢?因为fscanf()所使用%d的格式说明字符串的第一个格式说明忽略了一个整数前面的任何空白,为什么不额外%*c作为最后符造成的任何错误或副作用?有%d符不被忽略在一条线上各组3号后换行,则%*c必须是有意义的,因为它会吞噬它为什么即使工作没有错误或副作用的newline.Butfscanf()忽略空白字符为%d默认?当 %*c 找不到要吃的字符并且说明符和输入之间不匹配时,fscanf() 不应该停止扫描吗?不fscanf()匹配时不应该停止scanf()吗,就像那样?
EDIT:It even works if I use "%*c%d%d%d"!!Shouldn't the scanning and processing of subsequent characters stop once there is a mismatch between the format specifier and input at the beginning?
编辑:如果我使用它甚至可以工作"%*c%d%d%d"!一旦格式说明符和开头的输入不匹配,后续字符的扫描和处理不应该停止吗?
#include <stdio.h>
#include <stdlib.h>
int main ()
{
int n1,n2,n3;
FILE *fp;
fp=fopen("D:\data.txt","r");
if(fp==NULL)
{
printf("Error");
exit(-1);
}
while(fscanf(fp,"%d%d%d%*c",&n1,&n2,&n3)!=EOF) //Works as good as line below
//while(fscanf(fp,"%d%d%d",&n1,&n2,&n3)!=EOF)
printf("%d,%d,%d\n",n1,n2,n3);
fclose(fp);
}
Here's the format of the data in my file data.txt:
这是我的文件中数据的格式data.txt:
243 343 434
393 322 439
984 143 943
438 243 938
Output:
输出:
243 343 434
393 322 439
984 143 943
438 243 938
回答by Jonathan Leffler
Consider this variation of the program in the question:
考虑问题中程序的这种变体:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char *file = "D:\data.txt";
FILE *fp;
char *formats[] =
{
"%d%d%d%*c",
"%d%d%d",
"%*c%d%d%d",
};
if (argc > 1)
file = argv[1];
for (int i = 0; i < 3; i++)
{
if ((fp = fopen(file, "r")) == 0)
{
fprintf(stderr, "Failed to open file %s\n", file);
break;
}
printf("Format: %s\n", formats[i]);
int n1,n2,n3;
while (fscanf(fp, formats[i], &n1, &n2, &n3) == 3)
printf("%d, %d, %d\n", n1, n2, n3);
fclose(fp);
}
return 0;
}
The repeated opens are not efficient, but that isn't a concern here. Clarity and showing the behaviour is much more important.
重复打开效率不高,但这不是问题。清晰和展示行为更为重要。
It is written to (a) use a file name specified on the command line so I don't have to futz with names such as D:\data.txtwhich are very inconvenient to create on Unix systems, and (b) shows the three formats in use.
它被写入 (a) 使用在命令行上指定的文件名,因此我不必使用D:\data.txt在 Unix 系统上创建非常不方便的名称,并且 (b) 显示正在使用的三种格式。
Given the data file from the question:
鉴于问题中的数据文件:
243 343 434
393 322 439
984 143 943
438 243 938
The output of the program is:
程序的输出是:
Format: %d%d%d%*c
243, 343, 434
393, 322, 439
984, 143, 943
438, 243, 938
Format: %d%d%d
243, 343, 434
393, 322, 439
984, 143, 943
438, 243, 938
Format: %*c%d%d%d
43, 343, 434
393, 322, 439
984, 143, 943
438, 243, 938
Note that the first digit of the first number is consumed by the %*cwhen that is the first part of the format. After the first 3 numbers are read, the %*creads the newline after the third number on the line, then the %dskips further white space (except there isn't any) and reads the number.
请注意,第一个数字的第一个数字由%*cwhen 是格式的第一部分使用。读取前 3 个数字后,%*c读取该行第三个数字后的换行符,然后%d跳过更多空格(除非没有空格)并读取数字。
Otherwise, the behaviour is as expounded in the commentary below, largely lifted from another related question.
否则,行为如以下评论中所述,主要来自另一个相关问题。
Some of the code under discussion in the related question Use fscanf()to read from given linewas:
在一些相关的问题正在讨论的代码使用fscanf()从给定的读取线为:
fscanf(f, "%*d %*d %*d%*c");
fscanf(f, "%d%d%d", &num1, &num2, &num3);
I noted that the code should test the return value from fscanf(). However, with the three %*dconversion specifications, you might get a return value of EOF if you encountered EOF before reaching the specified line. You've no way of know that the first line contained a letter instead of a digit, unfortunately, until you execute the second fscanf(). You should test the second fscanf()too; you might get EOF, or 0 or 1 or 2 (all of which indicate problems), or you might get 3 indicating success with 3 conversions. Note that adding \nto the format means blank lines will be skipped, but that was going to happen anyway; %dskips white space to the first digit.
我注意到代码应该测试来自fscanf(). 但是,对于三个%*d转换规范,如果在到达指定行之前遇到 EOF,您可能会得到 EOF 的返回值。不幸的是,在执行第二行之前,您无法知道第一行包含的是字母而不是数字fscanf()。你也应该测试第二个fscanf();您可能会得到 EOF、0 或 1 或 2(所有这些都表示存在问题),或者您可能会得到 3 表示成功转换 3 次。请注意,添加\n到格式意味着将跳过空行,但无论如何这都会发生;%d将空格跳到第一位。
Is there any other way we can read but ignore entire lines like I clumsily did with
fscanf(f,"%*d%*d%*d")?Is using%*[^\n]the nearest thing one can do for this?
有没有其他方法可以阅读但忽略整行,就像我笨拙地做的那样
fscanf(f,"%*d%*d%*d")?是否使用%*[^\n]最接近的方法可以做到这一点?
The best way to skip whole lines is to use fgets(), as in the last version of the code in my answer. Obviously, there's an outside chance it will miscount lines if any of those lines is longer than 4095 bytes. OTOH, that's fairly improbable.
跳过整行的最佳方法是使用 fgets(),如我的答案中代码的最后一个版本。显然,如果这些行中的任何一行长于 4095 字节,它就有可能误算行数。OTOH,那是相当不可能的。
I have a confusion now and I don't want to put it in a question. So can you tell me this—
fscanf()ignores whitespace automatically, so after the first line, when three integers are read and ignored according to my%*d%*d%*dspecifier, I expectfscanf()to ignore the newline too when it starts reading in the next run of the loop. But why doesn't my additional%*cor\ncause problems and the program runs fine when I use%*d%*d%*d%*cor%*d%*d%*d\nin my code?
我现在有一个困惑,我不想把它放在一个问题中。所以你能告诉我这个 -
fscanf()自动忽略空格,所以在第一行之后,当三个整数被读取并根据我的%*d%*d%*d说明符被忽略时,我希望fscanf()在下一次循环中开始读取时也忽略换行符。但是为什么当我使用或在我的代码中时,我的附加%*c或\n引起问题并且程序运行良好?%*d%*d%*d%*c%*d%*d%*d\n
You can't tell where anything went wrong with those formats; you can detect EOF, but otherwise, fscanf()will return 0. However, since the %*dskips leading white space — including newlines — it doesn't much matter whether you read the newline after the third number with the %*cor not, and when you have \nthere, that's a white space so the read skips the newline and any trailing or leading white space, stopping when it reaches a non-white space character. Of course, you could also have newlines in the middle of the three numbers, or you could have more than three numbers on a line.
你无法判断这些格式哪里出了问题;您可以检测到 EOF,否则fscanf()将返回 0。但是,由于%*d跳过了前导空格 - 包括换行符 - 无论您是否在第三个数字之后读取换行符%*c,以及当您在\n那里时,都无关紧要,这是一个空格,因此读取会跳过换行符和任何尾随或前导空格,并在遇到非空格字符时停止。当然,您也可以在三个数字的中间换行,或者您可以在一行中包含三个以上的数字。
Note that the trailing \nin the format is particularly weird when the user is typing at the terminal. The user hits return, and keeps on hitting return, but the program doesn't continue until the user types a non-blank character. This is why fscanf()is so difficult to use when the data is not reliable. When it's reliable, it's easy, but if anything goes wrong, diagnostics and recovery are painful. That's why it is better to use fgets()and sscanf(); you have control over what is being parsed, you can try again with a different format if you want to, and you can report the whole line, not just what fscanf() has not managed to interpret.
请注意,\n当用户在终端上打字时,格式中的尾随特别奇怪。用户按回车,然后继续按回车,但直到用户输入一个非空白字符时程序才会继续。这就是为什么fscanf()在数据不可靠时如此难以使用的原因。当它可靠时,它很容易,但如果出现任何问题,诊断和恢复是痛苦的。这就是为什么最好使用fgets()and 的原因sscanf();您可以控制正在解析的内容,如果您愿意,可以使用不同的格式再次尝试,并且您可以报告整行,而不仅仅是 fscanf() 未能解释的内容。
Note that %c(and %*c) does not skip over white space; therefore, a %*cat the end of the format reads (and discards) the character after the number that was read. If that is the newline, then that's the character read and ignored. The scan set %[...]is the other conversion specification that does not skip white space; all other standard conversion specifications skip leading white space.
请注意,%c(and %*c) 不会跳过空格;因此,%*c格式末尾的a读取(并丢弃)读取的数字之后的字符。如果那是换行符,那么就是读取并忽略的字符。扫描集%[...]是另一个不跳过空格的转换规范;所有其他标准转换规范跳过前导空格。
回答by Dayal rai
fscanf() on success, the function returns the number of items of the argument list successfully filled. This count can match the expected number of items or be less (even zero) due to a matching failure, a reading error, or the reach of the end-of-file.
fscanf() 成功时,该函数返回成功填充的参数列表的项数。由于匹配失败、读取错误或到达文件末尾,此计数可能与预期的项目数匹配或更少(甚至为零)。
above para never talks about stopping on mismatch.it will try for extra specifier too and since no input so it will return only successfully scanned number.If there are too many arguments for the format specifications, the extra arguments are ignored. The results are undefined if there are not enough arguments for the format specifications.
上面的段落从来没有谈到停止不匹配。它也会尝试额外的说明符,因为没有输入,所以它只会返回成功扫描的数字。如果格式说明的参数太多,额外的参数将被忽略。如果格式规范没有足够的参数,则结果未定义。

