bash 计算输入文件中字符串的出现次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8969879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count the occurrence of a string in an input file
提问by Incognito
There is a shell script which is supposed to process an incoming text file.
有一个 shell 脚本应该处理传入的文本文件。
This text file contains strings split on multiple lines, and each string is present more than once.
此文本文件包含拆分为多行的字符串,并且每个字符串出现多次。
The shell script needs to read this text file and output the String and count of each string.
shell 脚本需要读取这个文本文件并输出每个字符串的字符串和计数。
Consider the text file is:
考虑文本文件是:
Tim
tim
Mark
MARk
Allen
ALLen
allEN
蒂姆
蒂姆
标记
标记
艾伦
艾伦
艾伦
The output should be like this:
输出应该是这样的:
Tim appears 2 times
Mark appears 2 times
Allen appears 3 times
蒂姆出现了2次
标记出现 2 次
艾伦出现3次
Right now, I am able to print the occurrence of strings, but that gets repeated the number of times the string occurs, that is "Tim appears 2 times" gets printed twice. I was trying to replace a string with NULL as soon as I count its occurrence, but for some reason, the sed is not working, coz maybe I am not invoking it at the right place (or in right way)
现在,我可以打印字符串的出现次数,但是会重复字符串出现的次数,即“Tim 出现 2 次”被打印两次。我试图在计算它的出现后立即用 NULL 替换字符串,但由于某种原因,sed 不起作用,因为也许我没有在正确的位置(或以正确的方式)调用它
#!/bin/bash
INPUT_FILE=""
declare -a LIST_CHARS
if [ $# -ne 1 ]
then
echo "Usage: sort -f FILE | uniq -ic
<file_name>"
exit 1
fi
if [ ! -f $INPUT_FILE ]
then
echo "$INPUT_FILE does not exists. Please specify correct file name"
exit 2
fi
while read line
do
while read i
do
echo $line
count=`grep -i $line | wc -l`
echo "String $line appears $count times"
done < $INPUT_FILE
done < $INPUT_FILE
回答by choroba
You can also use sort and uniq with flags to ignore case:
您还可以使用带有标志的 sort 和 uniq 来忽略大小写:
s/^ *\([0-9]\+\) \(.*\)/ appears times/
Simple sedcommand can change the output format to the specified one:
简单的sed命令可以将输出格式更改为指定的格式:
$ awk 'NF{ count[ toupper( while read line
do
uc=$(echo $line | tr [a-z] [A-Z] | tr -d ' ')
echo $uc $(grep -i "$uc" strs.txt | wc -l)
done< data.txt | sort | uniq
) ]++}
END{ for ( name in count ) { print name " appears " count[ name ] " times" };
}' input
回答by William Pursell
The classic awk solution is something like:
经典的 awk 解决方案类似于:
31
ALLEN 6
MARK 4
MOKADDIM 1
SHIPLU 1
TIM 4
回答by Shiplu Mokaddim
Assuming data.txtcontains your word Following script will do.
假设data.txt包含您的单词 以下脚本即可。
sort -f data.txt | uniq -i -c | while read num word
do
echo $(echo $word|tr [a-z] [A-Z]) appeard $num times
done
Output.
输出。
for i in `sort filename |uniq -c``
do
# --if to print data as u like--
done
Another option is
另一种选择是
##代码##Note: I see your text file contains blank lines. So the 31 in the output contains the number of blank lines.
注意:我看到您的文本文件包含空行。所以输出中的 31 包含空行数。

