Linux Bash脚本查找文件中每个字母的频率

Question

提问by SkypeMeSM

I am trying to find out the frequency of appearance of every letter in the english alphabet in an input file. How can I do this in a bash script?

我试图找出输入文件中英文字母表中每个字母的出现频率。如何在 bash 脚本中执行此操作？

Answer 1

采纳答案by ghostdog74

Just one awk command

只需一个 awk 命令

awk -vFS="" '{for(i=1;i<=NF;i++)w[$i]++}END{for(i in w) print i,w[i]}' file

if you want case insensitive, add tolower()

如果您想要不区分大小写，请添加 tolower()

awk -vFS="" '{for(i=1;i<=NF;i++)w[tolower($i)]++}END{for(i in w) print i,w[i]}' file

and if you want only characters,

如果你只想要字符，

awk -vFS="" '{for(i=1;i<=NF;i++){ if($i~/[a-zA-Z]/) { w[tolower($i)]++} } }END{for(i in w) print i,w[i]}' file

and if you want only digits, change /[a-zA-Z]/to /[0-9]/

如果您只想要数字，请更改/[a-zA-Z]/为/[0-9]/

if you do not want to show unicode, do export LC_ALL=C

如果您不想显示 unicode，请执行 export LC_ALL=C

Answer 2

回答by Benoit

Here is a suggestion:

这是一个建议：

while read -n 1 c
do
    echo "$c"
done < "$INPUT_FILE" | grep '[[:alpha:]]' | sort | uniq -c | sort -nr

Answer 3

回答by mouviciel

A solution with sed, sortand uniq:

具有sed,sort和的解决方案uniq：

sed 's/\(.\)/\n/g' file | sort | uniq -c

This counts all characters, not only letters. You can filter out with:

这计算所有字符，而不仅仅是字母。您可以使用以下方法过滤掉：

sed 's/\(.\)/\n/g' file | grep '[A-Za-z]' | sort | uniq -c

If you want to consider uppercase and lowercase as same, just add a translation:

如果您想将大写和小写视为相同，只需添加翻译：

sed 's/\(.\)/\n/g' file | tr '[:upper:]' '[:lower:]' | grep '[a-z]' | sort | uniq -c

Answer 4

回答by dogbane

My solution using grep, sortand uniq.

我的解决方案使用grep,sort和uniq.

grep -o . file | sort | uniq -c

Ignore case:

忽略大小写：

grep -o . file | sort -f | uniq -ic

Answer 5

回答by Anthony C Howe

Similar to mouviciel's answer above, but more generic for Bourne and Korn shells used on BSD systems, when you don't have GNU sed, which supports \n in a replacement, you can backslash escape a newline:

与上面 mouviciel 的答案类似，但对于 BSD 系统上使用的 Bourne 和 Korn shell 更通用，当您没有 GNU sed（它支持 \n 替代）时，您可以反斜杠转义换行符：

sed -e's/./&\
/g' file | sort | uniq -c | sort -nr

or to avoid the visual split on the screen, insert a literal newline by type CTRL+V CTRL+J

或者为了避免屏幕上的视觉分裂，通过键入 CTRL+V CTRL+J 插入一个文字换行符

sed -e's/./&\^J/g' file | sort | uniq -c | sort -nr

Linux Bash脚本查找文件中每个字母的频率

提问by SkypeMeSM

采纳答案by ghostdog74

回答by Benoit

回答by mouviciel

回答by dogbane

回答by Anthony C Howe

相关推荐

最近更新

标签

Linux Bash脚本查找文件中每个字母的频率

提问by SkypeMeSM

采纳答案by ghostdog74

回答by Benoit

回答by mouviciel

回答by dogbane

回答by Anthony C Howe

相关推荐

Linux ffmpeg 将 avi 拆分为具有已知帧速率的帧

如何仅提取 ELF 部分的原始内容？

C# 的 Lint

Linux .net 和 mono 中的 JSON 入门

相关推荐

最近更新

标签