Linux 命令或脚本计算文本文件中的重复行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6447473/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 04:40:46  来源:igfitidea点击:

Linux command or script counting duplicated lines in a text file?

linuxtextduplicates

提问by timeon

If I have a text file with the following conent

如果我有一个包含以下内容的文本文件

red apple
green apple
green apple
orange
orange
orange

Is there a Linux command or script that I can use to get the following result?

是否有可用于获得以下结果的 Linux 命令或脚本?

1 red apple
2 green apple
3 orange

采纳答案by borrible

Send it through sort(to put adjacent items together) then uniq -cto give counts, i.e.:

发送它sort(将相邻的项目放在一起)然后uniq -c给出计数,即:

sort filename | uniq -c

and to get that list in sorted order (by frequency) you can

并按排序顺序(按频率)获取该列表,您可以

sort filename | uniq -c | sort -nr

回答by user unknown

Can you live with an alphabetical, ordered list:

你能接受一个按字母顺序排列的列表:

echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u 

?

?

green apple
orange
red apple

or

或者

sort -u FILE

-u stands for unique, and uniqueness is only reached via sorting.

-u 代表唯一性,唯一性只能通过排序来实现。

A solution which preserves the order:

保留顺序的解决方案:

echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do   if [[ $line != $old ]]; then  echo $line;   old=$line; fi ; done }
red apple
green apple
orange

and, with a file

并且,用一个文件

cat file | { 
old=""
while read line
do
  if [[ $line != $old ]]
  then
    echo $line
    old=$line
  fi
done }

The last two only remove duplicates, which follow immediately - which fits to your example.

最后两个只删除重复项,立即跟随 - 这符合您的示例。

echo "red apple
green apple
lila banana
green apple
" ...

Will print two apples, split by a banana.

将打印两个被香蕉分开的苹果。

回答by mhyfritz

uniq -c file

uniq -c file

and in case the file is not sorted already:

如果文件尚未排序:

sort file | uniq -c

sort file | uniq -c

回答by pajton

cat <filename> | sort | uniq -c

回答by Rahul

Try this

尝试这个

cat myfile.txt| sort| uniq

回答by Chris Eberle

To just get a count:

只需计数:

$> egrep -o '\w+' fruits.txt | sort | uniq -c

      3 apple
      2 green
      1 oragen
      2 orange
      1 red

To get a sorted count:

要获得排序计数:

$> egrep -o '\w+' fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red
      2 green
      2 orange
      3 apple

EDIT

编辑

Aha, this was NOT along word boundaries, my bad. Here's the command to use for full lines:

啊哈,这不是字边界,我的错。这是用于整行的命令:

$> cat fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red apple
      2 green apple
      2 orange

回答by Jaberino

Almost the same as borribles' but if you add the dparam to uniqit only shows duplicates.

几乎与 borribles' 相同,但如果您向其中添加d参数,uniq则只会显示重复项。

sort filename | uniq -cd | sort -nr