Linux 命令或脚本计算文本文件中的重复行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6447473/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Linux command or script counting duplicated lines in a text file?
提问by timeon
If I have a text file with the following conent
如果我有一个包含以下内容的文本文件
red apple
green apple
green apple
orange
orange
orange
Is there a Linux command or script that I can use to get the following result?
是否有可用于获得以下结果的 Linux 命令或脚本?
1 red apple
2 green apple
3 orange
采纳答案by borrible
Send it through sort
(to put adjacent items together) then uniq -c
to give counts, i.e.:
发送它sort
(将相邻的项目放在一起)然后uniq -c
给出计数,即:
sort filename | uniq -c
and to get that list in sorted order (by frequency) you can
并按排序顺序(按频率)获取该列表,您可以
sort filename | uniq -c | sort -nr
回答by user unknown
Can you live with an alphabetical, ordered list:
你能接受一个按字母顺序排列的列表:
echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u
?
?
green apple
orange
red apple
or
或者
sort -u FILE
-u stands for unique, and uniqueness is only reached via sorting.
-u 代表唯一性,唯一性只能通过排序来实现。
A solution which preserves the order:
保留顺序的解决方案:
echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do if [[ $line != $old ]]; then echo $line; old=$line; fi ; done }
red apple
green apple
orange
and, with a file
并且,用一个文件
cat file | {
old=""
while read line
do
if [[ $line != $old ]]
then
echo $line
old=$line
fi
done }
The last two only remove duplicates, which follow immediately - which fits to your example.
最后两个只删除重复项,立即跟随 - 这符合您的示例。
echo "red apple
green apple
lila banana
green apple
" ...
Will print two apples, split by a banana.
将打印两个被香蕉分开的苹果。
回答by mhyfritz
uniq -c file
uniq -c file
and in case the file is not sorted already:
如果文件尚未排序:
sort file | uniq -c
sort file | uniq -c
回答by pajton
cat <filename> | sort | uniq -c
回答by Rahul
Try this
尝试这个
cat myfile.txt| sort| uniq
回答by Chris Eberle
To just get a count:
只需计数:
$> egrep -o '\w+' fruits.txt | sort | uniq -c
3 apple
2 green
1 oragen
2 orange
1 red
To get a sorted count:
要获得排序计数:
$> egrep -o '\w+' fruits.txt | sort | uniq -c | sort -nk1
1 oragen
1 red
2 green
2 orange
3 apple
EDIT
编辑
Aha, this was NOT along word boundaries, my bad. Here's the command to use for full lines:
啊哈,这不是字边界,我的错。这是用于整行的命令:
$> cat fruits.txt | sort | uniq -c | sort -nk1
1 oragen
1 red apple
2 green apple
2 orange
回答by Jaberino
Almost the same as borribles' but if you add the d
param to uniq
it only shows duplicates.
几乎与 borribles' 相同,但如果您向其中添加d
参数,uniq
则只会显示重复项。
sort filename | uniq -cd | sort -nr