bash 在不排序的情况下删除变量上的重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1872692/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 21:28:00  来源:igfitidea点击:

Removing duplicates on a variable without sorting

bashunixshellsortingvariables

提问by user224178

I have a variable that contains the following space separated entries.

我有一个包含以下空格分隔条目的变量。

variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"

How do I remove the duplicates without sorting?

如何在不排序的情况下删除重复项?

#Something like this.
new_variable="apple lemon papaya avocado grapes mango banana"

I have found somewhere a script that accomplish removing the duplicates of a variable, but does sort the contents.

我在某处找到了一个脚本,它可以完成删除变量的重复项,但确实对内容进行了排序。

#Not something like this.
new_variable=$(echo "$variable"|tr " " "\n"|sort|uniq|tr "\n" " ")
echo $new_variable
apple avocado banana grapes lemon mango papaya

回答by SiegeX

new_variable=$( awk 'BEGIN{RS=ORS=" "}!a[
awk 'BEGIN{RS=ORS=" "}{ if (a[
#!/bin/bash    

variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
temp="$variable"

new_variable="${temp%% *}"

while [[ "$temp" != ${new_variable##* } ]]; do
   temp=${temp//${temp%% *} /}
   new_variable="$new_variable ${temp%% *}"
done

echo $new_variable;
] == 0){ a[
variable=$(echo "$variable" | tr ' ' '\n' | nl | sort -u -k2 | sort -n | cut -f2-)
] += 1; print
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"

declare new_value=''

for item in $variable; do
  if [[ ! $new_value =~ $item ]] ; then   # first time?
    new_value="$new_value $item"
  fi
done
new_value=${new_value:1}                  # remove leading blank
}}'
]++' <<<$variable );

Here's how it works:

这是它的工作原理:

RS (Input Record Separator) is set to a white space so that it treats each fruit in $variable as a record instead of a field. The non-sorting unique magic happens with !a[$0]++. Since awk supports associative arrays, it uses the current record ($0) as the key to the array a[]. If that key has not been seen before, a[$0] evaluates to '0' (awk's default value for unset indices) which is then negated to return TRUE. I then exploit the fact that awk will default to 'print $0' if an expression returns TRUE and no '{ commands }' are given. Finally, a[$0] is then incremented such that this key can no longer return TRUE and thus repeat values are never printed. ORS (Output Record Separator) is set to a space as well to mimic the input format.

RS(输入记录分隔符)设置为一个空格,以便将 $variable 中的每个水果视为记录而不是字段。非排序的独特魔法发生在 !a[$0]++ 中。由于 awk 支持关联数组,它使用当前记录 ($0) 作为数组 a[] 的键。如果之前没有见过该键,则 a[$0] 评估为 '0'(awk 未设置索引的默认值),然后取反返回 TRUE。然后我利用这样一个事实,即如果表达式返回 TRUE 并且没有给出 '{ commands }',awk 将默认为 'print $0'。最后, a[$0] 然后递增,使得这个键不再返回 TRUE,因此永远不会打印重复值。ORS(输出记录分隔符)也设置为空格以模拟输入格式。

A less terse version of this command which produces the same output would be the following:

产生相同输出的此命令的一个不太简洁的版本如下:

words="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
seen=
for word in $words; do
  case $seen in
    $word\ * | *\ $word | *\ $word\ * | $word) 
      # already seen
      ;;
    *)
      seen="$seen $word"
      ;;
  esac
done
echo $seen

Gotta love awk =)

必须爱 awk =)

EDIT

编辑

If you needed to do this in pure Bash 2.1+, I would suggest this:

如果您需要在纯 Bash 2.1+ 中执行此操作,我建议这样做:

% variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
% print ${(zu)variable}                                                               
apple lemon papaya avocado grapes mango banana

回答by Mark Edgar

This pipeline version works by preserving the original order:

此管道版本通过保留原始顺序来工作:

declare -a arr
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
set -- $variable
count=0
for c in $@
do
    flag=0
    for((i=0;i<=${#arr[@]}-1;i++))
    do
        if [ "${arr[$i]}" == "$c" ] ;then
            flag=1
            break
        fi
    done
    if  [ "$flag" -eq 0 ] ; then
        arr[$count]="$c"
        count=$((count+1))
    fi
done
for((i=0;i<=${#arr[@]}-1;i++))
do
   echo "result: ${arr[$i]}"
done

回答by Fritz G. Mehner

Pure Bash:

纯重击:

linux# ./myscript.sh
result: apple
result: lemon
result: papaya
result: avocado
result: grapes
result: mango
result: banana

回答by Idelic

In pure, portable sh:

纯粹的,便携的sh

awk 'BEGIN{RS=ORS=" "} (!(
#!/bin/bash
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
variable=$(printf '%s\n' "$variable" | awk -v RS='[[:space:]]+' '!a[
apple lemon papaya avocado grapes mango banana
]++{printf "%s%s",
grapes
avocado
apple
lemon
banana
mango
papaya
, RT}') variable="${variable%,*}" echo "$variable"
in a) ){a[
1       banana
1       grapes
1       mango
2       apple
2       avocado
2       lemon
2       papaya
];print}'

回答by Dimitre Radoulov

Z Shell:

Z壳:

##代码##

回答by ghostdog74

shell

贝壳

##代码##

Result when run:

运行结果:

##代码##

OR if you want to use gawk

或者如果你想使用gawk

##代码##

回答by Jahid

Another awksolution:

另一种awk解决方案:

##代码##

Output:

输出:

##代码##

回答by Chris Koknat

Perl solution:

Perl解决方案:

perl -le 'for (@ARGV){ $h{$_}++ }; for (keys %h){ print $_ }' $variable

perl -le 'for (@ARGV){ $h{$_}++ }; for (keys %h){ print $_ }' $variable

@ARGVis the list of input parameters from $variable
Loop through the list, populating the hhash with the loop variable $_
Loop through the keys of the hhash, and print each one

@ARGV是来自$variable
Loop through the list 的输入参数列表,h用循环变量$_
Loop through the hhash的键填充散列,并打印每个

##代码##

This variation prints the output sorted first by frequency $h{$a} <=> $h{$b}and then alphabetically $a cmp $b

此变体打印输出,首先按频率排序$h{$a} <=> $h{$b},然后按字母顺序排序$a cmp $b

perl -le 'for (@ARGV){ $h{$_}++ }; for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" }' $variable

perl -le 'for (@ARGV){ $h{$_}++ }; for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" }' $variable

##代码##

This variation produces the same output as the last one.
However, instead of an input shell variable, uses an input file 'fruits', with one fruit per line:

此变体产生与上一个相同的输出。
但是,使用输入文件“fruits”而不是输入 shell 变量,每行一个水果:

perl -lne '$h{$_}++; END{ for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" } }' fruits

perl -lne '$h{$_}++; END{ for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" } }' fruits