在字段中查找重复项并在 unix bash 中打印它们
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17917505/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
finding duplicates in a field and printing them in unix bash
提问by t28292
I have a file the contains
我有一个包含的文件
apple
apple
banana
orange
apple
orange
I want a script that finds the duplicates apple and orange and tells the user that the following : apple and orange are repeated. I tried
我想要一个脚本来查找重复的 apple 和 orange 并告诉用户以下内容: apple 和 orange 是重复的。我试过
nawk '!x[]++' FS="," filename
to find repeated item so how can i print them out in unix bash ?
找到重复的项目,那么我如何在 unix bash 中将它们打印出来?
回答by devnull
In order to print the duplicate lines, you can say:
为了打印重复的行,您可以说:
$ sort filename | uniq -d
apple
orange
If you want to print the count as well, supply the -c
option to uniq
:
如果您还想打印计数,请提供以下-c
选项uniq
:
$ sort filename | uniq -dc
3 apple
2 orange
回答by Varun
+1 for devnul's answer. However, if the file contains spaces instead of newlines as delimiter. then the following would work.
+1 为devnul 的回答。但是,如果文件包含空格而不是换行符作为分隔符。那么以下将起作用。
tr [:blank:] "\n" < filename | sort | uniq -d
回答by hek2mgl
Update:
更新:
The question has been changed significantly. Formerly, when answering this, the input file should look like:
问题发生了重大变化。以前,在回答此问题时,输入文件应如下所示:
apple apple banana orange apple orange
banana orange apple
...
However, the solution will work anyway, but might be a little bit too complicated for this special use case.
但是,该解决方案无论如何都会起作用,但对于这个特殊用例来说可能有点太复杂了。
The following awk script will do the job:
以下 awk 脚本将完成这项工作:
awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file
Output:
输出:
apple 3
orange 2
It is more understandable in a form like this:
像这样的形式更容易理解:
#!/usr/bin/awk
{
i=1;
# iterate through every field
while(i <= NF) {
a[$(i++)]++; # count occurrences of every field
}
}
# after all input lines have been read ...
END {
for(i in a) {
# ... print those fields which occurred more than 1 time
if(a[i] > 1) {
print i,a[i];
}
}
}
Then make the file executable and execute it passing the input file name to it:
然后使文件可执行并执行它,将输入文件名传递给它:
chmod +x script.awk
./script.awk your.file