Linux 命令或脚本计算文本文件中的重复行？

Question

提问by timeon

If I have a text file with the following conent

如果我有一个包含以下内容的文本文件

red apple
green apple
green apple
orange
orange
orange

Is there a Linux command or script that I can use to get the following result?

是否有可用于获得以下结果的 Linux 命令或脚本？

1 red apple
2 green apple
3 orange

Answer 1

采纳答案by borrible

Send it through sort(to put adjacent items together) then uniq -cto give counts, i.e.:

发送它sort（将相邻的项目放在一起）然后uniq -c给出计数，即：

sort filename | uniq -c

and to get that list in sorted order (by frequency) you can

并按排序顺序（按频率）获取该列表，您可以

sort filename | uniq -c | sort -nr

Answer 2

回答by user unknown

Can you live with an alphabetical, ordered list:

你能接受一个按字母顺序排列的列表：

echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u

?

green apple
orange
red apple

or

或者

sort -u FILE

-u stands for unique, and uniqueness is only reached via sorting.

-u 代表唯一性，唯一性只能通过排序来实现。

A solution which preserves the order:

保留顺序的解决方案：

echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do   if [[ $line != $old ]]; then  echo $line;   old=$line; fi ; done }
red apple
green apple
orange

and, with a file

并且，用一个文件

cat file | { 
old=""
while read line
do
  if [[ $line != $old ]]
  then
    echo $line
    old=$line
  fi
done }

The last two only remove duplicates, which follow immediately - which fits to your example.

最后两个只删除重复项，立即跟随 - 这符合您的示例。

echo "red apple
green apple
lila banana
green apple
" ...

Will print two apples, split by a banana.

将打印两个被香蕉分开的苹果。

Answer 3

回答by mhyfritz

uniq -c file

and in case the file is not sorted already:

如果文件尚未排序：

sort file | uniq -c

Answer 4

回答by pajton

cat <filename> | sort | uniq -c

Answer 5

回答by Rahul

Try this

尝试这个

cat myfile.txt| sort| uniq

Answer 6

回答by Chris Eberle

To just get a count:

只需计数：

$> egrep -o '\w+' fruits.txt | sort | uniq -c

      3 apple
      2 green
      1 oragen
      2 orange
      1 red

To get a sorted count:

要获得排序计数：

$> egrep -o '\w+' fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red
      2 green
      2 orange
      3 apple

EDIT

编辑

Aha, this was NOT along word boundaries, my bad. Here's the command to use for full lines:

啊哈，这不是字边界，我的错。这是用于整行的命令：

$> cat fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red apple
      2 green apple
      2 orange

Answer 7

回答by Jaberino

Almost the same as borribles' but if you add the dparam to uniqit only shows duplicates.

几乎与 borribles' 相同，但如果您向其中添加d参数，uniq则只会显示重复项。

sort filename | uniq -cd | sort -nr

Linux 命令或脚本计算文本文件中的重复行？

提问by timeon

采纳答案by borrible

回答by user unknown

回答by mhyfritz

回答by pajton

回答by Rahul

回答by Chris Eberle

回答by Jaberino

相关推荐

最近更新

标签

Linux 命令或脚本计算文本文件中的重复行？

提问by timeon

采纳答案by borrible

回答by user unknown

回答by mhyfritz

回答by pajton

回答by Rahul

回答by Chris Eberle

回答by Jaberino

相关推荐

Linux 无法更改 tomcat 7 堆大小

Linux 获取其他线程的回溯

在 Linux 中使用 PHP 检查进程是否正在运行

是否可以在不创建 initrd 映像的情况下启动 Linux 内核？

相关推荐

最近更新

标签