bash 从 UNIX shell 脚本的列表中选择唯一或不同的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/618378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 18:00:12  来源:igfitidea点击:

Select unique or distinct values from a list in UNIX shell script

bashuniquedistinctkshsh

提问by brabster

I have a ksh script that returns a long list of values, newline separated, and I want to see only the unique/distinct values. It is possible to do this?

我有一个 ksh 脚本,它返回一长串值,换行符分隔,我只想看到唯一/不同的值。有可能做到这一点吗?

For example, say my output is file suffixes in a directory:

例如,假设我的输出是目录中的文件后缀:

tar
gz
java
gz
java
tar
class
class
tar
gz
java
gz
java
tar
class
class

I want to see a list like:

我想看到一个列表,如:

tar
gz
java
class
tar
gz
java
class

回答by Matthew Scharley

You might want to look at the uniqand sortapplications.

您可能想查看uniqsort应用程序。

./yourscript.ksh | sort | uniq

(FYI, yes, the sort is necessary in this command line, uniqonly strips duplicate lines that are immediately after each other)

(仅供参考,是的,此命令行中需要排序,uniq仅删除紧接其后的重复行)

EDIT:

编辑:

Contrary to what has been posted by Aaron Digullain relation to uniq's commandline options:

Aaron Digulla发布的有关uniq命令行选项的内容相反:

Given the following input:

给定以下输入:

class
jar
jar
jar
bin
bin
java

uniqwill output all lines exactly once:

uniq将只输出所有行一次:

class
jar
bin
java

uniq -dwill output all lines that appear more than once, and it will print them once:

uniq -d将输出出现多次的所有行,并打印一次:

jar
bin

uniq -uwill output all lines that appear exactly once, and it will print them once:

uniq -u将输出所有出现一次的行,并打印一次:

class
java

回答by gpojd

./script.sh | sort -u

This is the same as monoxide'sanswer, but a bit more concise.

这与monoxide 的答案相同,但更简洁一些。

回答by paxdiablo

For larger data sets where sorting may not be desirable, you can also use the following perl script:

对于可能不需要排序的较大数据集,您还可以使用以下 perl 脚本:

./yourscript.ksh | perl -ne 'if (!defined $x{$_}) { print $_; $x{$_} = 1; }'

This basically just remembers every line output so that it doesn't output it again.

这基本上只记住每一行输出,以便它不会再次输出。

It has the advantage over the "sort | uniq" solution in that there's no sorting required up front.

与“ sort | uniq”解决方案相比,它的优势在于无需预先进行排序。

回答by Dimitre Radoulov

With zshyou can do this:

使用zsh你可以这样做:

% cat infile 
tar
more than one word
gz
java
gz
java
tar
class
class
zsh-5.0.0[t]% print -l "${(fu)$(<infile)}"
tar
more than one word
gz
java
class

Or you can use AWK:

或者你可以使用 AWK:

% awk '!_[
 ./yourscript.ksh | awk '!a[
bag2set () {
    # Reduce a_bag to a_set.
    local -i i j n=${#a_bag[@]}
    for ((i=0; i < n; i++)); do
        if [[ -n ${a_bag[i]} ]]; then
            a_set[i]=${a_bag[i]}
            a_bag[i]=$'
awk '##代码## != x ":FOO" && NR>1 {print x} {x=##代码##} END {print}' file_name | uniq -f1 -u

' for ((j=i+1; j < n; j++)); do [[ ${a_set[i]} == ${a_bag[j]} ]] && a_bag[j]=$'##代码##' done fi done } declare -a a_bag=() a_set=() stdin="$(</dev/stdin)" declare -i i=0 for e in $stdin; do a_bag[i]=$e i=$i+1 done bag2set echo "${a_set[@]}"
]++'
]++' infile tar more than one word gz java class

回答by Aaron Digulla

Pipe them through sortand uniq. This removes all duplicates.

通过sort和管道它们uniq。这将删除所有重复项。

uniq -dgives only the duplicates, uniq -ugives only the unique ones (strips duplicates).

uniq -d仅给出重复项,uniq -u仅给出唯一项(去除重复项)。

回答by Ajak6

With AWK you can do, I find it faster than sort

使用 AWK 你可以做到,我发现它比排序更快

##代码##

回答by FGrose

Unique, as requested, (but not sorted);
uses fewer system resources for less than ~70 elements (as tested with time);
written to take input from stdin,
(or modify and include in another script):
(Bash)

唯一,根据要求,(但未排序);
为少于约 70 个元素使用更少的系统资源(经时间测试);
编写以从 stdin 获取输入,
(或修改并包含在另一个脚本中):(
Bash)

##代码##

回答by Mary Marty

I get a better tips to get non-duplicate entries in a file

我得到了一个更好的提示来获取文件中的非重复条目

##代码##