bash 使用 shell 脚本计算列中的唯一值

Question

提问by Lilly Tooner

I have a tab delimited file with 5 columns and need to retrieve a count of just the number of unique lines from column 2. I would normally do this with Perl/Python but I am forced to use the shell for this one.

我有一个包含 5 列的制表符分隔文件，需要从第 2 列中检索唯一行数的计数。我通常会使用 Perl/Python 来执行此操作，但我不得不为此使用 shell。

I have successfully in the past used *nix uniq function piped to wc but it looks like I am going to have to use awk in here.

我过去曾成功地使用 *nix uniq 函数通过管道传输到 wc，但看起来我将不得不在这里使用 awk。

Any advice would be greatly appreciated. (I have asked a similar question previously about column checks using awk but this is a little different and I wanted to separate it so if someone in the future has this question this will be here)

任何建议将不胜感激。（我之前问过一个关于使用 awk 进行列检查的类似问题，但这有点不同，我想把它分开，所以如果将来有人有这个问题，这将在这里）

Many many thanks!
Lilly

非常感谢！
礼来

Answer 1

回答by unwind

No need to use awk.

无需使用 awk。

$ cut -f2 file.txt | sort | uniq | wc -l

should do it.

应该这样做。

This uses the fact that tab is cut's default field separator, so we'll get just the content from column two this way. Then a pass through sortworks as a pre-stage to uniq, which removes the duplicates. Finally we count the lines, which is the sought number.

这使用了 tab 是cut默认字段分隔符的事实，因此我们将通过这种方式仅获取第二列的内容。然后传递sort作为到的前阶段uniq，删除重复项。最后我们计算行数，这就是所寻求的数字。

Answer 2

回答by martin clayton

I go for

我去

$ cut -f2 file.txt | sort -u | wc -l

At least in some versions, uniqrelies on the input data being sorted (it looks only at adjacent lines).

至少在某些版本中，uniq依赖于被排序的输入数据（它只查看相邻的行）。

For example in the Solaris docs:

例如在Solaris 文档中：

The uniq utility will read an input file comparing adjacent lines, and write one copy of each input line on the output. The second and succeeding copies of repeated adjacent input lines will not be written.
Repeated lines in the input will not be detected if they are not adjacent.

uniq 实用程序将读取比较相邻行的输入文件，并在输出上写入每个输入行的一个副本。不会写入重复的相邻输入行的第二个和后续副本。
如果输入中的重复行不相邻，则不会检测到它们。

Answer 3

回答by Vijay

awk '{if(##代码##~/Not Running/)a++;else if(##代码##~/Running/)b++}END{print a,b}' temp

bash 使用 shell 脚本计算列中的唯一值

提问by Lilly Tooner

回答by unwind

回答by martin clayton

回答by Vijay

相关推荐

最近更新

标签

bash 使用 shell 脚本计算列中的唯一值

提问by Lilly Tooner

回答by unwind

回答by martin clayton

回答by Vijay

相关推荐

Bash Shell 脚本 - 返回键/回车键

在 Mac OS X 中运行终端时如何找出别名（在 bash 意义上）的定义位置

bash 如何杀死shell的所有子进程？

bash 在 Unix 上连接文本文件中的多个字段

相关推荐

最近更新

标签