bash 在不排序的情况下删除变量上的重复项

Question

提问by user224178

I have a variable that contains the following space separated entries.

我有一个包含以下空格分隔条目的变量。

variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"

How do I remove the duplicates without sorting?

如何在不排序的情况下删除重复项？

#Something like this.
new_variable="apple lemon papaya avocado grapes mango banana"

I have found somewhere a script that accomplish removing the duplicates of a variable, but does sort the contents.

我在某处找到了一个脚本，它可以完成删除变量的重复项，但确实对内容进行了排序。

#Not something like this.
new_variable=$(echo "$variable"|tr " " "\n"|sort|uniq|tr "\n" " ")
echo $new_variable
apple avocado banana grapes lemon mango papaya

Answer 1

回答by SiegeX

new_variable=$( awk 'BEGIN{RS=ORS=" "}!a[awk 'BEGIN{RS=ORS=" "}{ if (a[#!/bin/bash    

variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
temp="$variable"

new_variable="${temp%% *}"

while [[ "$temp" != ${new_variable##* } ]]; do
   temp=${temp//${temp%% *} /}
   new_variable="$new_variable ${temp%% *}"
done

echo $new_variable;
] == 0){ a[variable=$(echo "$variable" | tr ' ' '\n' | nl | sort -u -k2 | sort -n | cut -f2-)
] += 1; print variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"

declare new_value=''

for item in $variable; do
  if [[ ! $new_value =~ $item ]] ; then   # first time?
    new_value="$new_value $item"
  fi
done
new_value=${new_value:1}                  # remove leading blank
}}'
]++' <<<$variable );

Here's how it works:

这是它的工作原理：

RS (Input Record Separator) is set to a white space so that it treats each fruit in $variable as a record instead of a field. The non-sorting unique magic happens with !a[$0]++. Since awk supports associative arrays, it uses the current record ($0) as the key to the array a[]. If that key has not been seen before, a[$0] evaluates to '0' (awk's default value for unset indices) which is then negated to return TRUE. I then exploit the fact that awk will default to 'print $0' if an expression returns TRUE and no '{ commands }' are given. Finally, a[$0] is then incremented such that this key can no longer return TRUE and thus repeat values are never printed. ORS (Output Record Separator) is set to a space as well to mimic the input format.

RS（输入记录分隔符）设置为一个空格，以便将 $variable 中的每个水果视为记录而不是字段。非排序的独特魔法发生在 !a[$0]++ 中。由于 awk 支持关联数组，它使用当前记录 ($0) 作为数组 a[] 的键。如果之前没有见过该键，则 a[$0] 评估为 '0'（awk 未设置索引的默认值），然后取反返回 TRUE。然后我利用这样一个事实，即如果表达式返回 TRUE 并且没有给出 '{ commands }'，awk 将默认为 'print $0'。最后， a[$0] 然后递增，使得这个键不再返回 TRUE，因此永远不会打印重复值。ORS（输出记录分隔符）也设置为空格以模拟输入格式。

A less terse version of this command which produces the same output would be the following:

产生相同输出的此命令的一个不太简洁的版本如下：

words="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
seen=
for word in $words; do
  case $seen in
    $word\ * | *\ $word | *\ $word\ * | $word) 
      # already seen
      ;;
    *)
      seen="$seen $word"
      ;;
  esac
done
echo $seen

Gotta love awk =)

必须爱 awk =)

EDIT

编辑

If you needed to do this in pure Bash 2.1+, I would suggest this:

如果您需要在纯 Bash 2.1+ 中执行此操作，我建议这样做：

% variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
% print ${(zu)variable}                                                               
apple lemon papaya avocado grapes mango banana

Answer 2

回答by Mark Edgar

This pipeline version works by preserving the original order:

此管道版本通过保留原始顺序来工作：

declare -a arr
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
set -- $variable
count=0
for c in $@
do
    flag=0
    for((i=0;i<=${#arr[@]}-1;i++))
    do
        if [ "${arr[$i]}" == "$c" ] ;then
            flag=1
            break
        fi
    done
    if  [ "$flag" -eq 0 ] ; then
        arr[$count]="$c"
        count=$((count+1))
    fi
done
for((i=0;i<=${#arr[@]}-1;i++))
do
   echo "result: ${arr[$i]}"
done

Answer 3

回答by Fritz G. Mehner

Pure Bash:

纯重击：

linux# ./myscript.sh
result: apple
result: lemon
result: papaya
result: avocado
result: grapes
result: mango
result: banana

Answer 4

回答by Idelic

In pure, portable sh:

纯粹的，便携的sh：

awk 'BEGIN{RS=ORS=" "} (!(#!/bin/bash
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
variable=$(printf '%s\n' "$variable" | awk -v RS='[[:space:]]+' '!a[apple lemon papaya avocado grapes mango banana
]++{printf "%s%s", grapes
avocado
apple
lemon
banana
mango
papaya
, RT}')
variable="${variable%,*}"
echo "$variable"
 in a) ){a[1       banana
1       grapes
1       mango
2       apple
2       avocado
2       lemon
2       papaya
];print}'

Answer 5

回答by Dimitre Radoulov

Z Shell:

Z壳：

##代码##

Answer 6

回答by ghostdog74

shell

贝壳

##代码##

Result when run:

运行结果：

##代码##

OR if you want to use gawk

或者如果你想使用gawk

##代码##

Answer 7

回答by Jahid

Another awksolution:

另一种awk解决方案：

##代码##

Output:

输出：

##代码##

Answer 8

回答by Chris Koknat

Perl solution:

Perl解决方案：

perl -le 'for (@ARGV){ $h{$_}++ }; for (keys %h){ print $_ }' $variable

@ARGVis the list of input parameters from $variable
Loop through the list, populating the hhash with the loop variable $_
Loop through the keys of the hhash, and print each one

@ARGV是来自$variable
Loop through the list 的输入参数列表，h用循环变量$_
Loop through the hhash的键填充散列，并打印每个

##代码##

This variation prints the output sorted first by frequency $h{$a} <=> $h{$b}and then alphabetically $a cmp $b

此变体打印输出，首先按频率排序$h{$a} <=> $h{$b}，然后按字母顺序排序$a cmp $b

perl -le 'for (@ARGV){ $h{$_}++ }; for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" }' $variable

##代码##

This variation produces the same output as the last one.
However, instead of an input shell variable, uses an input file 'fruits', with one fruit per line:

此变体产生与上一个相同的输出。
但是，使用输入文件“fruits”而不是输入 shell 变量，每行一个水果：

perl -lne '$h{$_}++; END{ for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" } }' fruits

bash 在不排序的情况下删除变量上的重复项

提问by user224178

回答by SiegeX

回答by Mark Edgar

回答by Fritz G. Mehner

回答by Idelic

回答by Dimitre Radoulov

回答by ghostdog74

回答by Jahid

回答by Chris Koknat

相关推荐

最近更新

标签

bash 在不排序的情况下删除变量上的重复项

提问by user224178

回答by SiegeX

回答by Mark Edgar

回答by Fritz G. Mehner

回答by Idelic

回答by Dimitre Radoulov

回答by ghostdog74

回答by Jahid

回答by Chris Koknat

相关推荐

在 BASH 中使用文件内容作为命令行参数

多线程 BASH 编程——通用方法？

bash Bash命令递归列出文件但按分类排序

bash 参数中的主目录扩展 (~)

相关推荐

最近更新

标签