Python 来自 txt 文件程序的字数统计

Question

提问by user3068762

I am counting word of a txt file with the following code:

我正在使用以下代码计算 txt 文件的字数：

#!/usr/bin/python
file=open("D:\zzzz\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
print (word,wordcount)
file.close();

this is giving me the output like this:

这给了我这样的输出：

>>> 
goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, '???': 1, 'tiger': 1, 'cat': 2, 'dog': 1}

but I want the output in the following manner:

但我希望以下列方式输出：

word  wordcount
goat    2
cow     1
dog     1.....

Also I am getting an extra symbol in the output (???). How can I remove this?

此外，我在输出 ( ???) 中得到了一个额外的符号。我怎样才能删除它？

Answer 1

回答by Tim Pietzcker

The funny symbols you're encountering are a UTF-8 BOM (Byte Order Mark). To get rid of them, open the file using the correct encoding (I'm assuming you're on Python 3):

您遇到的有趣符号是 UTF-8 BOM (Byte Order Mark)。要摆脱它们，请使用正确的编码打开文件（我假设您使用的是 Python 3）：

file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig")

Furthermore, for counting, you can use collections.Counter:

此外，对于计数，您可以使用collections.Counter：

from collections import Counter
wordcount = Counter(file.read().split())

Displaying them is easy as well:

显示它们也很容易：

>>> for item in wordcount.items(): print("{}\t{}".format(*item))
...
snake   1
lion    2
goat    2
horse   3

Answer 2

回答by duck

import sys
file=open(sys.argv[1],"r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
for key in wordcount.keys():
  print ("%s %s " %(key , wordcount[key]))
file.close();

Answer 3

回答by bistaumanga

#!/usr/bin/python
file=open("D:\zzzz\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
for k,v in wordcount.items():
    print k, v

Answer 4

回答by Wesam Na

If you are using graphLab, you can use this function. It is really powerfull

如果您使用的是 graphLab，则可以使用此功能。真的很强大

products['word_count'] = graphlab.text_analytics.count_words(your_text)

Answer 5

回答by Fuji Komalan

FILE_NAME = 'file.txt'

wordCounter = {}

with open(FILE_NAME,'r') as fh:
  for line in fh:
    # Replacing punctuation characters. Making the string to lower.
    # The split will spit the line into a list.
    word_list = line.replace(',','').replace('\'','').replace('.','').lower().split()
    for word in word_list:
      # Adding  the word into the wordCounter dictionary.
      if word not in wordCounter:
        wordCounter[word] = 1
      else:
        # if the word is already in the dictionary update its count.
        wordCounter[word] = wordCounter[word] + 1

print('{:15}{:3}'.format('Word','Count'))
print('-' * 18)

# printing the words and its occurrence.
for  (word,occurance)  in wordCounter.items(): 
  print('{:15}{:3}'.format(word,occurance))

#

    Word           Count
    ------------------
    of               6
    examples         2
    used             2
    development      2
    modified         2
    open-source      2

Answer 6

回答by Sivaji

#!/usr/bin/python
file=open("D:\zzzz\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1

for k,v in wordcount.items():
    print k,v
file.close();

Answer 7

回答by Sanj

you can do this:

你可以这样做：

file= open(r'D:\zzzz\names2.txt')
file_split=set(file.read().split())
print(len(file_split))

Python 来自 txt 文件程序的字数统计

提问by user3068762

回答by Tim Pietzcker

回答by duck

回答by bistaumanga

回答by Wesam Na

回答by Fuji Komalan

回答by Sivaji

回答by Sanj

相关推荐

最近更新

标签

Python 来自 txt 文件程序的字数统计

提问by user3068762

回答by Tim Pietzcker

回答by duck

回答by bistaumanga

回答by Wesam Na

回答by Fuji Komalan

回答by Sivaji

回答by Sanj

相关推荐

如何从一个numpy数组构造一个ndarray？Python

Python 如何将base64字符串转换为图像？

语法错误：扫描字符串文字时 EOL -Python

Python 类型错误：<lambda>() 不接受任何参数（给定 1 个）

相关推荐

最近更新

标签