Python 计算文本文件中的字母
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18647707/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count letters in a text file
提问by user2752551
I am a beginner python programmer and I am trying to make a program which counts the numbers of letters in a text file. Here is what I've got so far:
我是一名 Python 初学者,我正在尝试编写一个程序来计算文本文件中的字母数。这是我到目前为止所得到的:
import string
text = open('text.txt')
letters = string.ascii_lowercase
for i in text:
text_lower = i.lower()
text_nospace = text_lower.replace(" ", "")
text_nopunctuation = text_nospace.strip(string.punctuation)
for a in letters:
if a in text_nopunctuation:
num = text_nopunctuation.count(a)
print(a, num)
If the text file contains hello bob
, I want the output to be:
如果文本文件包含hello bob
,我希望输出为:
b 2
e 1
h 1
l 2
o 2
My problem is that it doesn't work properly when the text file contains more than one line of text or has punctuation.
我的问题是当文本文件包含多行文本或带有标点符号时,它无法正常工作。
回答by moliware
You have to use collections.Counter
你必须使用 collections.Counter
from collections import Counter
text = 'aaaaabbbbbccccc'
c = Counter(text)
print c
It prints:
它打印:
Counter({'a': 5, 'c': 5, 'b': 5})
Your text
variable should be:
你的text
变量应该是:
import string
text = open('text.txt').read()
# Filter all characters that are not letters.
text = filter(lambda x: x in string.letters, text.lower())
For getting the output you need:
要获得您需要的输出:
for letter, repetitions in c.iteritems():
print letter, repetitions
In my example it prints:
在我的示例中,它打印:
a 5
c 5
b 5
For more information Counters doc
有关更多信息计数器文档
回答by elyase
This is very readable way to accomplish what you want using Counter:
这是使用Counter完成您想要的操作的非常易读的方式:
from string import ascii_lowercase
from collections import Counter
with open('text.txt') as f:
print Counter(letter for line in f
for letter in line.lower()
if letter in ascii_lowercase)
You can iterate the resulting dict to print it in the format that you want.
您可以迭代生成的 dict 以您想要的格式打印它。
回答by elyase
Using re:
使用重新:
import re
context, m = 'some file to search or text', {}
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in range(len(letters)):
m[letters[i]] = len(re.findall('{0}'.format(letters[i]), context))
print '{0} -> {1}'.format(letters[i], m[letters[i]])
It is much more elegant and clean with Counter nonetheless.
尽管如此,使用 Counter 会更加优雅和干净。
回答by no1
import string
fp=open('text.txt','r')
file_list=fp.readlines()
print file_list
freqs = {}
for line in file_list:
line = filter(lambda x: x in string.letters, line.lower())
for char in line:
if char in freqs:
freqs[char] += 1
else:
freqs[char] = 1
print freqs
回答by tobias_k
Just for the sake of completeness, if you want to do it without using Counter
, here's another very short way, using list comprehension and the dict
builtin:
只是为了完整起见,如果你想不使用它来做Counter
,这是另一种非常简短的方法,使用列表理解和dict
内置:
from string import ascii_lowercase as letters
with open("text.txt") as f:
text = f.read().lower()
print dict((l, text.count(l)) for l in letters)
f.read()
will read the content of the entire file into the text
variable (might be a bad idea, if the file is really large); then we use a list comprehension to create a list of tuples (letter, count in text)
and convert this list of tuples to a dictionary. With Python 2.7+ you can also use {l: text.count(l) for l in letters}
, which is even shorter and a bit more readable.
f.read()
将整个文件的内容读入text
变量(如果文件真的很大,这可能是个坏主意);然后我们使用列表理解来创建一个元组(letter, count in text)
列表并将这个元组列表转换为字典。在 Python 2.7+ 中,您还可以使用{l: text.count(l) for l in letters}
,它更短,可读性更强。
Note, however, that this will search the text multiple times, once for each letter, whereas Counter
scans it only once and updates the counts for all the letters in one go.
但是请注意,这将多次搜索文本,每个字母一次,而Counter
只扫描一次并一次性更新所有字母的计数。
回答by jfs
You could split the problem into two simpler tasks:
您可以将问题拆分为两个更简单的任务:
#!/usr/bin/env python
import fileinput # accept input from stdin and/or files specified at command-line
from collections import Counter
from itertools import chain
from string import ascii_lowercase
# 1. count frequencies of all characters (bytes on Python 2)
freq = Counter(chain.from_iterable(fileinput.input())) # read one line at a time
# 2. print frequencies of ascii letters
for c in ascii_lowercase:
n = freq[c] + freq[c.upper()] # merge lower- and upper-case occurrences
if n != 0:
print(c, n)
回答by Public Person
import sys
def main():
try:
fileCountAllLetters = file(sys.argv[1], 'r')
print "Count all your letters: ", len(fileCountAllLetters.read())
except IndexError:
print "You forget add file in argument!"
except IOError:
print "File like this not your folder!"
main()
python file.py countlettersfile.txt
python file.py countlettersfile.txt
回答by Maxim Egorushkin
Yet another way:
还有一种方式:
import sys
from collections import defaultdict
read_chunk_size = 65536
freq = defaultdict(int)
for c in sys.stdin.read(read_chunk_size):
freq[ord(c.lower())] += 1
for symbol, count in sorted(freq.items(), key=lambda kv: kv[1], reverse=True):
print(chr(symbol), count)
It outputs the symbols most frequent to the least.
它输出最频繁到最少的符号。
The character counting loop is O(1) complexity and can handle arbitrarily large files because it reads the file in read_chunk_size
chunks.
字符计数循环的复杂度为 O(1),可以处理任意大的文件,因为它以read_chunk_size
块的形式读取文件。