bash 如何删除基于列值的重复行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22849757/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 10:06:51  来源:igfitidea点击:

How to delete duplicated rows based in a column value?

linuxbashawkdelete-row

提问by user3494949

Given the following table

鉴于下表

 123456.451 entered-auto_attendant
 123456.451 duration:76 real:76
 139651.526 entered-auto_attendant
 139651.526 duration:62 real:62`
 139382.537 entered-auto_attendant 

Using a bash shell script based in Linux, I'd like to delete all the rows based on the value of column 1 (The one with the long number). Having into consideration that this number is a variable number

使用基于 Linux 的 bash shell 脚本,我想根据第 1 列的值(具有长数字的值)删除所有行。考虑到这个数字是一个可变数字

I've tried with

我试过

awk '{a[$3]++}!(a[$3]-1)' file

awk '{a[$3]++}!(a[$3]-1)' file

sort -u | uniq

But I am not getting the result which would be something like this, making a comparison between all the values of the first column, delete all the duplicates and show it

但我没有得到类似这样的结果,在第一列的所有值之间进行比较,删除所有重复项并显示它

 123456.451 entered-auto_attendant
 139651.526 entered-auto_attendant
 139382.537 entered-auto_attendant 

采纳答案by Kent

you didn't give an expected output, does this work for you?

你没有给出预期的输出,这对你有用吗?

 awk '!a[]++' file

with your data, the output is:

使用您的数据,输出为:

123456.451 entered-auto_attendant
139651.526 entered-auto_attendant
139382.537 entered-auto_attendant

and this line prints only unique column1 line:

此行仅打印唯一的 column1 行:

 awk '{a[]++;b[]=
139382.537 entered-auto_attendant
}END{for(x in a)if(a[x]==1)print b[x]}' file

output:

输出:

sort -t ' ' -k 1,1 -u file

回答by that other guy

uniq, by default, compares the entire line. Since your lines are not identical, they are not removed.

uniq,默认情况下,比较整行。由于您的行不相同,因此不会删除它们。

You can use sortto conveniently sort by the first field and also delete duplicates of it:

您可以使用sort方便地按第一个字段排序并删除它的重复项:

awk '!x[]++ { print ,  }' file
  • -t ' 'fields are separated by spaces
  • -k 1,1: only look at the first field
  • -u: delete duplicates
  • -t ' '字段由空格分隔
  • -k 1,1: 只看第一个字段
  • -u: 删除重复项

Additionally, you might have seen the awk '!a[$0]++'trick for deduplicating lines. You can make this dedupe on the first column only using awk '!a[$1]++'.

此外,您可能已经看到了awk '!a[$0]++'重复数据删除行的技巧。您只能使用awk '!a[$1]++'.

回答by Yogesh Deore

try this command

试试这个命令

awk '!( in a){a[]++; next}  in a' file
123456.451 duration:76 real:76
139651.526 duration:62 real:62

回答by anubhava

Using awk:

使用 awk:

##代码##