bash 计算输入文件中字符串的出现次数

Question

提问by Incognito

There is a shell script which is supposed to process an incoming text file.

有一个 shell 脚本应该处理传入的文本文件。

This text file contains strings split on multiple lines, and each string is present more than once.

此文本文件包含拆分为多行的字符串，并且每个字符串出现多次。

The shell script needs to read this text file and output the String and count of each string.

shell 脚本需要读取这个文本文件并输出每个字符串的字符串和计数。

Consider the text file is:

考虑文本文件是：

Tim
tim
Mark
MARk
Allen
ALLen
allEN

蒂姆
蒂姆
标记
标记
艾伦
艾伦
艾伦

The output should be like this:

输出应该是这样的：

Tim appears 2 times
Mark appears 2 times
Allen appears 3 times

蒂姆出现了2次
标记出现 2 次
艾伦出现3次

Right now, I am able to print the occurrence of strings, but that gets repeated the number of times the string occurs, that is "Tim appears 2 times" gets printed twice. I was trying to replace a string with NULL as soon as I count its occurrence, but for some reason, the sed is not working, coz maybe I am not invoking it at the right place (or in right way)

现在，我可以打印字符串的出现次数，但是会重复字符串出现的次数，即“Tim 出现 2 次”被打印两次。我试图在计算它的出现后立即用 NULL 替换字符串，但由于某种原因，sed 不起作用，因为也许我没有在正确的位置（或以正确的方式）调用它

 #!/bin/bash

INPUT_FILE=""
declare -a LIST_CHARS

if [ $# -ne 1 ]
then
        echo "Usage: sort -f FILE | uniq -ic
 <file_name>"
        exit 1
fi


if [ ! -f $INPUT_FILE ]
then
        echo "$INPUT_FILE does not exists. Please specify correct file name"
        exit 2
fi

while read line
do
        while read i
        do
                echo $line
                count=`grep -i $line | wc -l`
                echo "String $line appears $count times"
        done < $INPUT_FILE

done < $INPUT_FILE

Answer 1

回答by choroba

You can also use sort and uniq with flags to ignore case:

您还可以使用带有标志的 sort 和 uniq 来忽略大小写：

s/^ *\([0-9]\+\) \(.*\)/ appears  times/

Simple sedcommand can change the output format to the specified one:

简单的sed命令可以将输出格式更改为指定的格式：

$ awk 'NF{ count[ toupper( while read line
do  
    uc=$(echo $line | tr [a-z] [A-Z] | tr -d ' ')
    echo  $uc $(grep -i "$uc" strs.txt | wc -l)
done< data.txt | sort | uniq
 ) ]++} 
    END{ for ( name in count ) { print name " appears " count[ name ] " times" };
}' input

Answer 2

回答by William Pursell

The classic awk solution is something like:

经典的 awk 解决方案类似于：

31
ALLEN 6
MARK 4
MOKADDIM 1
SHIPLU 1
TIM 4

Answer 3

回答by Shiplu Mokaddim

Assuming data.txtcontains your word Following script will do.

假设data.txt包含您的单词以下脚本即可。

sort -f data.txt | uniq -i -c  | while read num word
do  
    echo $(echo $word|tr [a-z] [A-Z])  appeard  $num times
done

Output.

输出。

for i in `sort filename |uniq -c``
do
    # --if to print data as u like--
done

Another option is

另一种选择是

##代码##

Note: I see your text file contains blank lines. So the 31 in the output contains the number of blank lines.

注意：我看到您的文本文件包含空行。所以输出中的 31 包含空行数。

Answer 4

回答by Balaswamy Vaddeman

##代码##

bash 计算输入文件中字符串的出现次数

提问by Incognito

回答by choroba

回答by William Pursell

回答by Shiplu Mokaddim

回答by Balaswamy Vaddeman

相关推荐

最近更新

标签

bash 计算输入文件中字符串的出现次数

提问by Incognito

回答by choroba

回答by William Pursell

回答by Shiplu Mokaddim

回答by Balaswamy Vaddeman

相关推荐

bash 中“exec”和“exit”的区别

在本地 bash 函数变量中为脚本设置环境变量

bash 命令行等效于从 Mac OS X finder 打印 .docx 文件

bash 重命名与目录名称同名的文件

相关推荐

最近更新

标签