Linux Bash脚本查找文件中每个字母的频率

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3966820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 23:43:27  来源:igfitidea点击:

Bash script to find the frequency of every letter in a file

linuxbashfrequencyletters

提问by SkypeMeSM

I am trying to find out the frequency of appearance of every letter in the english alphabet in an input file. How can I do this in a bash script?

我试图找出输入文件中英文字母表中每个字母的出现频率。如何在 bash 脚本中执行此操作?

采纳答案by ghostdog74

Just one awk command

只需一个 awk 命令

awk -vFS="" '{for(i=1;i<=NF;i++)w[$i]++}END{for(i in w) print i,w[i]}' file

if you want case insensitive, add tolower()

如果您想要不区分大小写,请添加 tolower()

awk -vFS="" '{for(i=1;i<=NF;i++)w[tolower($i)]++}END{for(i in w) print i,w[i]}' file

and if you want only characters,

如果你只想要字符,

awk -vFS="" '{for(i=1;i<=NF;i++){ if($i~/[a-zA-Z]/) { w[tolower($i)]++} } }END{for(i in w) print i,w[i]}' file

and if you want only digits, change /[a-zA-Z]/to /[0-9]/

如果您只想要数字,请更改/[a-zA-Z]//[0-9]/

if you do not want to show unicode, do export LC_ALL=C

如果您不想显示 unicode,请执行 export LC_ALL=C

回答by Benoit

Here is a suggestion:

这是一个建议:

while read -n 1 c
do
    echo "$c"
done < "$INPUT_FILE" | grep '[[:alpha:]]' | sort | uniq -c | sort -nr

回答by mouviciel

A solution with sed, sortand uniq:

具有sed,sort和的解决方案uniq

sed 's/\(.\)/\n/g' file | sort | uniq -c

This counts all characters, not only letters. You can filter out with:

这计算所有字符,而不仅仅是字母。您可以使用以下方法过滤掉:

sed 's/\(.\)/\n/g' file | grep '[A-Za-z]' | sort | uniq -c

If you want to consider uppercase and lowercase as same, just add a translation:

如果您想将大写和小写视为相同,只需添加翻译:

sed 's/\(.\)/\n/g' file | tr '[:upper:]' '[:lower:]' | grep '[a-z]' | sort | uniq -c

回答by dogbane

My solution using grep, sortand uniq.

我的解决方案使用grep,sortuniq.

grep -o . file | sort | uniq -c

Ignore case:

忽略大小写:

grep -o . file | sort -f | uniq -ic

回答by Anthony C Howe

Similar to mouviciel's answer above, but more generic for Bourne and Korn shells used on BSD systems, when you don't have GNU sed, which supports \n in a replacement, you can backslash escape a newline:

与上面 mouviciel 的答案类似,但对于 BSD 系统上使用的 Bourne 和 Korn shell 更通用,当您没有 GNU sed(它支持 \n 替代)时,您可以反斜杠转义换行符:

sed -e's/./&\
/g' file | sort | uniq -c | sort -nr

or to avoid the visual split on the screen, insert a literal newline by type CTRL+V CTRL+J

或者为了避免屏幕上的视觉分裂,通过键入 CTRL+V CTRL+J 插入一个文字换行符

sed -e's/./&\^J/g' file | sort | uniq -c | sort -nr