bash 计算输入文件中字符串的出现次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8969879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 01:23:49  来源:igfitidea点击:

Count the occurrence of a string in an input file

linuxbashshell

提问by Incognito

There is a shell script which is supposed to process an incoming text file.

有一个 shell 脚本应该处理传入的文本文件。

This text file contains strings split on multiple lines, and each string is present more than once.

此文本文件包含拆分为多行的字符串,并且每个字符串出现多次。

The shell script needs to read this text file and output the String and count of each string.

shell 脚本需要读取这个文本文件并输出每个字符串的字符串和计数。

Consider the text file is:

考虑文本文件是:

Tim

tim

Mark

MARk

Allen

ALLen

allEN

蒂姆

蒂姆

标记

标记

艾伦

艾伦

艾伦

The output should be like this:

输出应该是这样的:

Tim appears 2 times

Mark appears 2 times

Allen appears 3 times

蒂姆出现了2次

标记出现 2 次

艾伦出现3次

Right now, I am able to print the occurrence of strings, but that gets repeated the number of times the string occurs, that is "Tim appears 2 times" gets printed twice. I was trying to replace a string with NULL as soon as I count its occurrence, but for some reason, the sed is not working, coz maybe I am not invoking it at the right place (or in right way)

现在,我可以打印字符串的出现次数,但是会重复字符串出现的次数,即“Tim 出现 2 次”被打印两次。我试图在计算它的出现后立即用 NULL 替换字符串,但由于某种原因,sed 不起作用,因为也许我没有在正确的位置(或以正确的方式)调用它

 #!/bin/bash

INPUT_FILE=""
declare -a LIST_CHARS

if [ $# -ne 1 ]
then
        echo "Usage: 
sort -f FILE | uniq -ic
<file_name>" exit 1 fi if [ ! -f $INPUT_FILE ] then echo "$INPUT_FILE does not exists. Please specify correct file name" exit 2 fi while read line do while read i do echo $line count=`grep -i $line | wc -l` echo "String $line appears $count times" done < $INPUT_FILE done < $INPUT_FILE

回答by choroba

You can also use sort and uniq with flags to ignore case:

您还可以使用带有标志的 sort 和 uniq 来忽略大小写:

s/^ *\([0-9]\+\) \(.*\)/ appears  times/

Simple sedcommand can change the output format to the specified one:

简单的sed命令可以将输出格式更改为指定的格式:

$ awk 'NF{ count[ toupper( 
while read line
do  
    uc=$(echo $line | tr [a-z] [A-Z] | tr -d ' ')
    echo  $uc $(grep -i "$uc" strs.txt | wc -l)
done< data.txt | sort | uniq
) ]++} END{ for ( name in count ) { print name " appears " count[ name ] " times" }; }' input

回答by William Pursell

The classic awk solution is something like:

经典的 awk 解决方案类似于:

31
ALLEN 6
MARK 4
MOKADDIM 1
SHIPLU 1
TIM 4

回答by Shiplu Mokaddim

Assuming data.txtcontains your word Following script will do.

假设data.txt包含您的单词 以下脚本即可。

sort -f data.txt | uniq -i -c  | while read num word
do  
    echo $(echo $word|tr [a-z] [A-Z])  appeard  $num times
done

Output.

输出。

for i in `sort filename |uniq -c``
do
    # --if to print data as u like--
done

Another option is

另一种选择是

##代码##

Note: I see your text file contains blank lines. So the 31 in the output contains the number of blank lines.

注意:我看到您的文本文件包含空行。所以输出中的 31 包含空行数。

回答by Balaswamy Vaddeman

##代码##