Linux Bash脚本查找文件中每个字母的频率
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3966820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bash script to find the frequency of every letter in a file
提问by SkypeMeSM
I am trying to find out the frequency of appearance of every letter in the english alphabet in an input file. How can I do this in a bash script?
我试图找出输入文件中英文字母表中每个字母的出现频率。如何在 bash 脚本中执行此操作?
采纳答案by ghostdog74
Just one awk command
只需一个 awk 命令
awk -vFS="" '{for(i=1;i<=NF;i++)w[$i]++}END{for(i in w) print i,w[i]}' file
if you want case insensitive, add tolower()
如果您想要不区分大小写,请添加 tolower()
awk -vFS="" '{for(i=1;i<=NF;i++)w[tolower($i)]++}END{for(i in w) print i,w[i]}' file
and if you want only characters,
如果你只想要字符,
awk -vFS="" '{for(i=1;i<=NF;i++){ if($i~/[a-zA-Z]/) { w[tolower($i)]++} } }END{for(i in w) print i,w[i]}' file
and if you want only digits, change /[a-zA-Z]/
to /[0-9]/
如果您只想要数字,请更改/[a-zA-Z]/
为/[0-9]/
if you do not want to show unicode, do export LC_ALL=C
如果您不想显示 unicode,请执行 export LC_ALL=C
回答by Benoit
Here is a suggestion:
这是一个建议:
while read -n 1 c
do
echo "$c"
done < "$INPUT_FILE" | grep '[[:alpha:]]' | sort | uniq -c | sort -nr
回答by mouviciel
A solution with sed
, sort
and uniq
:
具有sed
,sort
和的解决方案uniq
:
sed 's/\(.\)/\n/g' file | sort | uniq -c
This counts all characters, not only letters. You can filter out with:
这计算所有字符,而不仅仅是字母。您可以使用以下方法过滤掉:
sed 's/\(.\)/\n/g' file | grep '[A-Za-z]' | sort | uniq -c
If you want to consider uppercase and lowercase as same, just add a translation:
如果您想将大写和小写视为相同,只需添加翻译:
sed 's/\(.\)/\n/g' file | tr '[:upper:]' '[:lower:]' | grep '[a-z]' | sort | uniq -c
回答by dogbane
My solution using grep
, sort
and uniq
.
我的解决方案使用grep
,sort
和uniq
.
grep -o . file | sort | uniq -c
Ignore case:
忽略大小写:
grep -o . file | sort -f | uniq -ic
回答by Anthony C Howe
Similar to mouviciel's answer above, but more generic for Bourne and Korn shells used on BSD systems, when you don't have GNU sed, which supports \n in a replacement, you can backslash escape a newline:
与上面 mouviciel 的答案类似,但对于 BSD 系统上使用的 Bourne 和 Korn shell 更通用,当您没有 GNU sed(它支持 \n 替代)时,您可以反斜杠转义换行符:
sed -e's/./&\
/g' file | sort | uniq -c | sort -nr
or to avoid the visual split on the screen, insert a literal newline by type CTRL+V CTRL+J
或者为了避免屏幕上的视觉分裂,通过键入 CTRL+V CTRL+J 插入一个文字换行符
sed -e's/./&\^J/g' file | sort | uniq -c | sort -nr