bash 如何删除基于列值的重复行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22849757/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to delete duplicated rows based in a column value?
提问by user3494949
Given the following table
鉴于下表
123456.451 entered-auto_attendant
123456.451 duration:76 real:76
139651.526 entered-auto_attendant
139651.526 duration:62 real:62`
139382.537 entered-auto_attendant
Using a bash shell script based in Linux, I'd like to delete all the rows based on the value of column 1 (The one with the long number). Having into consideration that this number is a variable number
使用基于 Linux 的 bash shell 脚本,我想根据第 1 列的值(具有长数字的值)删除所有行。考虑到这个数字是一个可变数字
I've tried with
我试过
awk '{a[$3]++}!(a[$3]-1)' file
awk '{a[$3]++}!(a[$3]-1)' file
sort -u | uniq
But I am not getting the result which would be something like this, making a comparison between all the values of the first column, delete all the duplicates and show it
但我没有得到类似这样的结果,在第一列的所有值之间进行比较,删除所有重复项并显示它
123456.451 entered-auto_attendant
139651.526 entered-auto_attendant
139382.537 entered-auto_attendant
采纳答案by Kent
you didn't give an expected output, does this work for you?
你没有给出预期的输出,这对你有用吗?
awk '!a[]++' file
with your data, the output is:
使用您的数据,输出为:
123456.451 entered-auto_attendant
139651.526 entered-auto_attendant
139382.537 entered-auto_attendant
and this line prints only unique column1 line:
此行仅打印唯一的 column1 行:
awk '{a[]++;b[]=139382.537 entered-auto_attendant
}END{for(x in a)if(a[x]==1)print b[x]}' file
output:
输出:
sort -t ' ' -k 1,1 -u file
回答by that other guy
uniq
, by default, compares the entire line. Since your lines are not identical, they are not removed.
uniq
,默认情况下,比较整行。由于您的行不相同,因此不会删除它们。
You can use sort
to conveniently sort by the first field and also delete duplicates of it:
您可以使用sort
方便地按第一个字段排序并删除它的重复项:
awk '!x[]++ { print , }' file
-t ' '
fields are separated by spaces-k 1,1
: only look at the first field-u
: delete duplicates
-t ' '
字段由空格分隔-k 1,1
: 只看第一个字段-u
: 删除重复项
Additionally, you might have seen the awk '!a[$0]++'
trick for deduplicating lines. You can make this dedupe on the first column only using awk '!a[$1]++'
.
此外,您可能已经看到了awk '!a[$0]++'
重复数据删除行的技巧。您只能使用awk '!a[$1]++'
.
回答by Yogesh Deore
try this command
试试这个命令
awk '!( in a){a[]++; next} in a' file
123456.451 duration:76 real:76
139651.526 duration:62 real:62
回答by anubhava
Using awk:
使用 awk:
##代码##