python 确定密文的字母频率

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/992408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:13:52  来源:igfitidea点击:

Determining Letter Frequency Of Cipher Text

pythonencryptioncryptography

提问by Veedrac

I am trying to make a tool that finds the frequencies of letters in some type of cipher text. Lets suppose it is all lowercase a-z no numbers. The encoded message is in a txt file

我正在尝试制作一种工具,可以在某种类型的密文中查找字母的频率。让我们假设它都是小写的 az 没有数字。编码后的消息位于 txt 文件中

I am trying to build a script to help in cracking of substitution or possibly transposition ciphers.

我正在尝试构建一个脚本来帮助破解替换或可能的换位密码。

Code so far:

到目前为止的代码:

cipher = open('cipher.txt','U').read()
cipherfilter = cipher.lower()
cipherletters = list(cipherfilter)

alpha = list('abcdefghijklmnopqrstuvwxyz')
occurrences = {} 
for letter in alpha:
    occurrences[letter] = cipherfilter.count(letter)
for letter in occurrences:
    print letter, occurrences[letter]

All it does so far is show how many times a letter appears. How would I print the frequency of all letters found in this file.

到目前为止,它所做的只是显示一个字母出现的次数。我将如何打印在此文件中找到的所有字母的频率。

回答by mechanical_meat

import collections

d = collections.defaultdict(int)
for c in 'test':
    d[c] += 1

print d # defaultdict(<type 'int'>, {'s': 1, 'e': 1, 't': 2})

From a file:

从一个文件:

myfile = open('test.txt')
for line in myfile:
    line = line.rstrip('\n')
    for c in line:
        d[c] += 1

For the genius that is the defaultdictcontainer, we must give thanks and praise. Otherwise we'd all be doing something silly like this:

对于defaultdict容器的天才,我们必须感谢和赞美。否则我们都会做这样的蠢事:

s = "andnowforsomethingcompletelydifferent"
d = {}
for letter in s:
    if letter not in d:
        d[letter] = 1
    else:
        d[letter] += 1

回答by Veedrac

The modern way:

现代方式:

from collections import Counter

string = "ihavesometextbutidontmindsharing"
Counter(string)
#>>> Counter({'i': 4, 't': 4, 'e': 3, 'n': 3, 's': 2, 'h': 2, 'm': 2, 'o': 2, 'a': 2, 'd': 2, 'x': 1, 'r': 1, 'u': 1, 'b': 1, 'v': 1, 'g': 1})

回答by jacob

If you want to know the relative frequencyof a letter c, you would have to divide number of occurrences of c by the length of the input.

如果您想知道字母 c的相对频率,您必须将 c 出现的次数除以输入的长度。

For instance, taking Adam's example:

例如,以亚当的为例:

s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37

and storing the absolute frequence of each letter in

并将每个字母的绝对频率存储在

dict[letter]

we obtain the relative frequencies by:

我们通过以下方式获得相对频率:

from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
    print c, dict[c]/float(n)

putting it all together, we get something like this:

把它们放在一起,我们得到这样的东西:

# get input
s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37

# get absolute frequencies of letters
import collections
dict = collections.defaultdict(int)
for c in s:
    dict[c] += 1

# print relative frequencies
from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
    print c, dict[c]/float(n)