在字段中查找重复项并在 unix bash 中打印它们

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17917505/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 06:02:48  来源:igfitidea点击:

finding duplicates in a field and printing them in unix bash

bashunixawk

提问by t28292

I have a file the contains

我有一个包含的文件

apple
apple
banana
orange
apple
orange

I want a script that finds the duplicates apple and orange and tells the user that the following : apple and orange are repeated. I tried

我想要一个脚本来查找重复的 apple 和 orange 并告诉用户以下内容: apple 和 orange 是重复的。我试过

nawk '!x[]++' FS="," filename

to find repeated item so how can i print them out in unix bash ?

找到重复的项目,那么我如何在 unix bash 中将它们打印出来?

回答by devnull

In order to print the duplicate lines, you can say:

为了打印重复的行,您可以说:

$ sort filename | uniq -d
apple
orange

If you want to print the count as well, supply the -coption to uniq:

如果您还想打印计数,请提供以下-c选项uniq

$ sort filename | uniq -dc
      3 apple
      2 orange

回答by Varun

+1 for devnul's answer. However, if the file contains spaces instead of newlines as delimiter. then the following would work.

+1 为devnul 的回答。但是,如果文件包含空格而不是换行符作为分隔符。那么以下将起作用。

tr [:blank:] "\n" < filename | sort | uniq -d

回答by hek2mgl

Update:

更新:

The question has been changed significantly. Formerly, when answering this, the input file should look like:

问题发生了重大变化。以前,在回答此问题时,输入文件应如下所示:

apple apple banana orange apple orange
banana orange apple
...

However, the solution will work anyway, but might be a little bit too complicated for this special use case.

但是,该解决方案无论如何都会起作用,但对于这个特殊用例来说可能有点太复杂了。



The following awk script will do the job:

以下 awk 脚本将完成这项工作:

awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file

Output:

输出:

apple 3
orange 2

It is more understandable in a form like this:

像这样的形式更容易理解:

#!/usr/bin/awk

{
  i=1;
  # iterate through every field
  while(i <= NF) {
    a[$(i++)]++; # count occurrences of every field
  }
}

# after all input lines have been read ...
END {
  for(i in a) {
    # ... print those fields which occurred more than 1 time
    if(a[i] > 1) {
      print i,a[i];
    }
  }
}

Then make the file executable and execute it passing the input file name to it:

然后使文件可执行并执行它,将输入文件名传递给它:

chmod +x script.awk
./script.awk your.file