Bash- 是否可以仅将 -uniq 用于一行的一列？

Question

提问by teutara

    1.gui  Qxx  16
    2.gu   Qxy  23
    3.guT  QWS  18
    4.gui  Qxr  21

i want to sort a file depending a value in the 3rd column, so i use:

我想根据第三列中的值对文件进行排序，所以我使用：

sort -rnk3 myfile

2.gu   Qxy  23
4.gui  Qxr  21
3.guT  QWS  18
1.gui  Qxx  16

now i have to output as: (the line starting with 3.gui is out because the line with 4.gui has a greater value)

现在我必须输出为：（以 3.gui 开头的行已退出，因为带有 4.gui 的行具有更大的价值）

2.gu   Qxy  23
4.gui  Qxr  21
1.guT  QWS  18

i can not use -headbecause i have millions of rows and i do not where to cut, i could not figure a way to use -uniqbecause it treats a line as whole and since i can not tell -uniqto look at first column, it counts a line which has unique it outputs it -which is normal-. i know -uniqcan ignore a number of characters but as you can see from example first column might have various character count..

我不能使用，-head因为我有数百万行，我不知道在哪里剪切，我想不出一种使用方法，-uniq因为它把一条线当作一个整体，而且因为我不知道-uniq看第一列，它计算了一条线有唯一它输出它-这是正常的-。我知道-uniq可以忽略许多字符，但正如您从示例中看到的第一列可能有各种字符数..

please advice..

请指教..

Answer 1

回答by Guru

Try this:

尝试这个：

sort -rnk3 myfile | awk -F"[. ]" '!a[]++'

awk removes the duplicates depending on the 2nd column. This is actually a famous awk syntax to remove duplicates. An array is maintained where the record of 2nd field is maintained. Every time before a record is printed, the 2nd field is checked in the array. If not present, it is printed, else its discarded since it is duplicate. This is achived using the ++. First time, when a record is encountered, this ++ will keep the count as 0 since its post-fix. SUbsequent occurences will increase the value which when negated becomes false.

awk 根据第二列删除重复项。这实际上是一种著名的 awk 语法，用于删除重复项。维护一个数组，其中维护了第二个字段的记录。每次打印记录之前，都会检查数组中的第二个字段。如果不存在，则打印它，否则将其丢弃，因为它是重复的。这是使用 ++ 实现的。第一次，当遇到一条记录时，这个 ++ 将自其后修复以来将计数保持为 0。后续发生将增加该值，当否定时该值变为假。

Answer 2

回答by Chris Seymour

Here you go:

干得好：

sort -rnk3 file | awk -F'[. ]' '{ if (a[]++ == 0) print }' 

2.gu   Qxy  23
4.gui  Qxr  21
1.guT  QWS  18

This uses awkto check duplicate values in the second field where by the field separator is either a whitespace or a period. So this is what it treats the second field as:

这用于awk检查第二个字段中的重复值，其中字段分隔符是空格或句点。所以这就是它将第二个字段视为：

$ awk -F'[. ]' '{ print  }' file

gu
gui
guT
gui

In awkthe variable $0represents the whole line, $1represents the first field, and so on..

在awk变量中$0代表整行，$1代表第一个字段，以此类推。

awk -F'[. ]' '{ if (a[$2]++ == 0) print }'the -Foptions let you specify the field separator, in this case it's either whitespace or a period.

awk -F'[. ]' '{ if (a[$2]++ == 0) print }'这些-F选项让您指定字段分隔符，在这种情况下，它可以是空格或句点。

Answer 3

回答by Ziferius

So I found this by the all powerful and amazing Google -- My little script builds off @sudo_O 's answer, in that it shows you all the duplicate lines found...., not a file without duplicates.

所以我通过强大而神奇的谷歌找到了这个——我的小脚本建立在@sudo_O 的答案之上，因为它向你展示了找到的所有重复行......，而不是没有重复的文件。

The text I was finding all duplicates in the 3rd column (port) were in a file called master.txt

我在第 3 列（端口）中找到所有重复项的文本位于名为 master.txt 的文件中

awk '{if (a[]++ > 0) print}' master.txt | while read site thread port
do
  grep $port master.txt
done

Bash- 是否可以仅将 -uniq 用于一行的一列？

提问by teutara

回答by Guru

回答by Chris Seymour

回答by Ziferius

相关推荐

最近更新

标签

Bash- 是否可以仅将 -uniq 用于一行的一列？

提问by teutara

回答by Guru

回答by Chris Seymour

回答by Ziferius

相关推荐

bash 从shell脚本执行Maven任务并获取错误代码

bash 在目录中查找文件夹，而不列出父目录

在 OS X 上，如何将我的 shell 从鱼改回 bash？

Bash 脚本：将流从串行端口 (/dev/ttyUSB0) 保存到文件，直到出现特定输入（例如 eof）

相关推荐

最近更新

标签