C语言 C 字数统计程序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22969076/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 10:59:37  来源:igfitidea点击:

C Word Count program

cword-count

提问by user3516302

I am trying to write a program that will count the number of characters, words and lines in a text, the text is:

我正在尝试编写一个程序来计算文本中的字符、单词和行数,文本是:

It was a dark and stormy night;
the rain fell in torrents - except
at occasional intervals, when it was
checked by a violent gust of wind
which swept up the streets (for it is
in London that our scene lies),
rattling along the housetops, and fiercely
agitating the scanty flame of the lamps
that struggled against the darkness.

  Edward Bulwer-Lytton's novel Paul Clifford.

I keep getting 62instead of 64, any suggestions?

我不断得到62而不是64,有什么建议吗?

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>

int main() {
    int tot_chars = 0;     /* total characters */
    int tot_lines = 0;     /* total lines */
    int tot_words = 0;     /* total words */
    int boolean;
    /* EOF == end of file */
    int n;
    while ((n = getchar()) != EOF) {
        tot_chars++;
        if (isspace(n) && !isspace(getchar())) {
            tot_words++;
        }
        if (n == '\n') {
            tot_lines++;
        }
        if (n == '-') {
            tot_words--;
        }
    }
    printf("Lines, Words, Characters\n");
    printf(" %3d %3d %3d\n", tot_lines, tot_words, tot_chars);

    // Should be 11 64 375
    // rn     is 11 65 375
    return 0;
}

回答by chqrlie

There are multiple problems in your code:

您的代码中有多个问题:

  • in the test if (isspace(n) && !isspace(getchar()))you potentially consume a byte from the file and fail to increment tot_chars, furthermore you do not increment tot_wordsif 2 words are separated by 2 white space characters. This causes darkness.and Edwardto be counted as a single word.
  • you decrement tot_wordswhen you see a hyphen, which is incorrect as words are separated by white space only. This causes Bulwer-Lytton'sto be counted as 1-1, ie zero. Hence you only get 62 words instead of 64.

  • on a lesser note, the name nis confusing for a byte read from the file. It is usually a more appropriate name for a count. The idiomatic name for a byte read from a file is c, and the type is correct as intto accommodate for all values of unsigned charplus the special value EOF.

  • 在测试中,if (isspace(n) && !isspace(getchar()))您可能会消耗文件中的一个字节并且无法递增tot_chars,此外,tot_words如果 2 个单词由 2 个空格字符分隔,则不会递增。这会导致darkness.Edward被算作一个词。
  • tot_words当您看到连字符时,您会递减,这是不正确的,因为单词仅由空格分隔。这导致Bulwer-Lytton's被计为1-1,即零。因此你只能得到 62 个单词而不是 64 个。

  • 在较小的说明中,该名称n对于从文件中读取的字节会造成混淆。它通常是一个更合适的计数名称。从文件中读取的字节的惯用名称是c,并且类型是正确的,int可以容纳 的所有值unsigned char加上特殊值EOF

To detect word boundaries, you should use a state and update the word count when the state changes:

要检测单词边界,您应该使用状态并在状态更改时更新单词计数:

#include <ctype.h>
#include <stdio.h>

int main(void) {
    int tot_chars = 0;     /* total characters */
    int tot_lines = 0;     /* total lines */
    int tot_words = 0;     /* total words */
    int in_space = 1;
    int c, last = '\n';

    while ((c = getchar()) != EOF) {
        last = c;
        tot_chars++;
        if (isspace(c)) {
            in_space = 1;
            if (c == '\n') {
                tot_lines++;
            }
        } else {
            tot_words += in_space;
            in_space = 0;
        }
    }
    if (last != '\n') {
        /* count last line if not linefeed terminated */
        tot_lines++;
    }

    printf("Lines, Words, Characters\n");
    printf(" %3d %3d %3d\n", tot_lines, tot_words, tot_chars);

    return 0;
}

回答by Mudassir Hussain

Actually Now i think you have to modify the program,Assuming words are separated by spaces(any other white space Character) and counting on this base will not work if your text has two or more spaces(any other white space Character) to separate a single word. Because this will be also counted as words, (when there where no actual words used)

实际上现在我认为您必须修改程序,假设单词由空格(任何其他空格字符)分隔,并且如果您的文本有两个或更多空格(任何其他空格字符)来分隔一个一个字。因为这也会被算作单词,(当没有使用实际单词时)

I think your last ifblock is really messy, you are using ispunct()to decrement tot_wordsbut your words in text uses punctuation marks in them(without spaces),This means they are part of words. so you should not decrement them.

我认为你的最后一个if块真的很乱,你ispunct()用来递减,tot_words但你在文本中的单词使用标点符号(没有空格),这意味着它们是单词的一部分。所以你不应该减少它们。

Previously i thought we should check only for the '-'character in last ifblock, As its used in 1st para of text with spaces, but it is also again used in Novel name without any space, so i think you should completely ignore last ifblock and consider '-'as word for simplicity of the logic.

以前我认为我们应该只检查-'最后一个if块中的 '字符,因为它在文本的第一段中使用了空格,但它也再次用于没有任何空格的小说名称中,所以我认为你应该完全忽略最后一个if块并考虑'-'作为逻辑简单的词。

I have modified the first if block it makes your program error proof even when two or more spaces(any other white space Character) are given to separate a word.

我已经修改了第一个 if 块,即使给出两个或多个空格(任何其他空格字符)来分隔单词,它也可以使您的程序防错。

if (isspace(n))  // isspace() checks for whitespace characters '  ', '\t', '\n','\r, so no need to write like this (isspace(n) || n == '\n')
    boolean=0; //outside of word.     
else if(boolean==0){
    tot_words++;
    boolean=1; //inside of word.
 }

 if (n=='\n')
         tot_lines++;

回答by Vino

I check your code and it works fine, also i got the output (total words) as it desired to be- Seems the code has been edited from its original post

我检查了你的代码,它工作正常,我也得到了它想要的输出(总字数) - 似乎代码已经从它的原始帖子中进行了编辑

Attaching the Output what I got after running the code- Output enter image description here

附上运行代码后得到的输出 - 输出 在此处输入图片说明

回答by Michelle

Both of the following conditionals increment your word count on newline characters, which means that every word followed by a newline instead of a space is counted twice:

以下两个条件都会增加换行符的字数,这意味着后跟换行符而不是空格的每个单词都被计算两次:

if (isspace(n) || n == '\n'){
     tot_words++;
}
if (n=='\n'){
     tot_lines++;
     tot_words++;
}

If you get rid of the || n == '\n'bit, you should get the correct count.

如果你摆脱了这个|| n == '\n'位,你应该得到正确的计数。

回答by user207064

Change

改变

        if (n=='\n'){
                tot_lines++;
                tot_words++;
        }

to

  if (n=='\n'){
                tot_lines++;
        }

You are already counting word at new line in

您已经在计算新行中的单词

            if (isspace(n) || n == '\n'){
                    tot_words++;
            }

So effectively you are incrementing word counter one time extra then required for each line.

如此有效地您将字计数器增加一次,然后每行都需要。

回答by debug

$ ./a.out " a b " "a b c " "a b c d"
s =  a b , words_cnt= 2
 s = a b c , words_cnt= 3
 s = a b c d, words_cnt= 4

$ ./a.out "It was a dark and stormy night;
> the rain fell in torrents - except
......
  Edward Bulwer-Lytton's novel Paul Clifford., words_cnt = 64


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


int
count_words(const char *s)
{
    int i, w;

    for (i = 0, w = 0; i < strlen(s); i++)
    {
        if (!isspace(*(s+i)))
        {
            w++;
            while (!isspace(*(s+i)) && *(s+i) != '##代码##')
            {
                i++;
            }
        }
    }

    return w;
}

int
main(int argc, const char *argv[])
{
    int i;

    if (argc < 2)
    {
        printf("[*] Usage: %s <str1> <str2> ...\n", argv[0]);
        return -1;
    }

    for (i = 1; i < argc; i++)
    {
        printf("s = %s, words_cnt= %d\n ", argv[i], count_words(argv[i]));
    }

    return 0;
}