Python 来自 txt 文件程序的字数统计

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21107505/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:05:45  来源:igfitidea点击:

Word count from a txt file program

python

提问by user3068762

I am counting word of a txt file with the following code:

我正在使用以下代码计算 txt 文件的字数:

#!/usr/bin/python
file=open("D:\zzzz\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
print (word,wordcount)
file.close();

this is giving me the output like this:

这给了我这样的输出:

>>> 
goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, '???': 1, 'tiger': 1, 'cat': 2, 'dog': 1}

but I want the output in the following manner:

但我希望以下列方式输出:

word  wordcount
goat    2
cow     1
dog     1.....

Also I am getting an extra symbol in the output (???). How can I remove this?

此外,我在输出 ( ???) 中得到了一个额外的符号。我怎样才能删除它?

回答by Tim Pietzcker

The funny symbols you're encountering are a UTF-8 BOM (Byte Order Mark). To get rid of them, open the file using the correct encoding (I'm assuming you're on Python 3):

您遇到的有趣符号是 UTF-8 BOM (Byte Order Mark)。要摆脱它们,请使用正确的编码打开文件(我假设您使用的是 Python 3):

file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig")

Furthermore, for counting, you can use collections.Counter:

此外,对于计数,您可以使用collections.Counter

from collections import Counter
wordcount = Counter(file.read().split())

Displaying them is easy as well:

显示它们也很容易:

>>> for item in wordcount.items(): print("{}\t{}".format(*item))
...
snake   1
lion    2
goat    2
horse   3

回答by duck

import sys
file=open(sys.argv[1],"r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
for key in wordcount.keys():
  print ("%s %s " %(key , wordcount[key]))
file.close();

回答by bistaumanga

#!/usr/bin/python
file=open("D:\zzzz\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
for k,v in wordcount.items():
    print k, v

回答by Wesam Na

If you are using graphLab, you can use this function. It is really powerfull

如果您使用的是 graphLab,则可以使用此功能。真的很强大

products['word_count'] = graphlab.text_analytics.count_words(your_text)

回答by Fuji Komalan

FILE_NAME = 'file.txt'

wordCounter = {}

with open(FILE_NAME,'r') as fh:
  for line in fh:
    # Replacing punctuation characters. Making the string to lower.
    # The split will spit the line into a list.
    word_list = line.replace(',','').replace('\'','').replace('.','').lower().split()
    for word in word_list:
      # Adding  the word into the wordCounter dictionary.
      if word not in wordCounter:
        wordCounter[word] = 1
      else:
        # if the word is already in the dictionary update its count.
        wordCounter[word] = wordCounter[word] + 1

print('{:15}{:3}'.format('Word','Count'))
print('-' * 18)

# printing the words and its occurrence.
for  (word,occurance)  in wordCounter.items(): 
  print('{:15}{:3}'.format(word,occurance))
#
    Word           Count
    ------------------
    of               6
    examples         2
    used             2
    development      2
    modified         2
    open-source      2

回答by Sivaji

#!/usr/bin/python
file=open("D:\zzzz\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1

for k,v in wordcount.items():
    print k,v
file.close();

回答by Sanj

you can do this:

你可以这样做:

file= open(r'D:\zzzz\names2.txt')
file_split=set(file.read().split())
print(len(file_split))