bash bash中两个数组的比较/差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2312762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 18:57:29  来源:igfitidea点击:

Compare/Difference of two arrays in bash

arraysbashdiffcompare

提问by Kiran

Is it possible to take the difference of two arrays in bash.
Would be really great if you could suggest me the way to do it.

是否可以在bash中取两个数组的差异。
如果你能建议我这样做的方法,那就太好了。

Code :

代码 :

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" ) 

Array3 =diff(Array1, Array2)

Array3 ideally should be :
Array3=( "key7" "key8" "key9" "key10" )

Appreciate your help.

感谢你的帮助。

采纳答案by ephemient

If you strictly want Array1 - Array2, then

如果你严格想要Array1 - Array2,那么

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )

Array3=()
for i in "${Array1[@]}"; do
    skip=
    for j in "${Array2[@]}"; do
        [[ $i == $j ]] && { skip=1; break; }
    done
    [[ -n $skip ]] || Array3+=("$i")
done
declare -p Array3

Runtime might be improved with associative arrays, but I personally wouldn't bother. If you're manipulating enough data for that to matter, shell is the wrong tool.

使用关联数组可能会改进运行时,但我个人不会打扰。如果您要处理足够的数据,那么 shell 是错误的工具。



For a symmetric difference like Dennis's answer, existing tools like commwork, as long as we massage the input and output a bit (since they work on line-based files, not shell variables).

对于像丹尼斯的答案这样的对称差异comm,只要我们稍微调整输入和输出(因为它们适用于基于行的文件,而不是 shell 变量),现有的工具就可以工作。

Here, we tell the shell to use newlines to join the array into a single string, and discard tabs when reading lines from commback into an array.

在这里,我们告诉 shell 使用换行符将数组连接成单个字符串,并在从comm数组中读取行时丢弃制表符。

$ oldIFS=$IFS IFS=$'\n\t'
$ Array3=($(comm -3 <(echo "${Array1[*]}") <(echo "${Array2[*]}")))
comm: file 1 is not in sorted order
$ IFS=$oldIFS
$ declare -p Array3
declare -a Array3='([0]="key7" [1]="key8" [2]="key9" [3]="key10")'

It complains because, by lexographical sorting, key1 < … < key9 > key10. But since both input arrays are sorted similarly, it's fine to ignore that warning. You can use --nocheck-orderto get rid of the warning, or add a | sort -uinside the <(…)process substitution if you can't guarantee order&uniqueness of the input arrays.

它抱怨是因为,通过字典排序,key1 < … < key9 > key10. 但是由于两个输入数组的排序方式相似,因此可以忽略该警告。如果您不能保证输入数组的顺序和唯一性,您可以使用--nocheck-order来消除警告,或者| sort -u<(…)进程内部添加一个替换。

回答by Ilya Bystrov

echo ${Array1[@]} ${Array2[@]} | tr ' ' '\n' | sort | uniq -u

Output

输出

key10
key7
key8
key9

You can add sorting if you need

如果需要,您可以添加排序

回答by SiegeX

Anytime a question pops up dealing with unique values that may not be sorted, my mind immediately goes to awk. Here is my take on it.

每当出现处理可能无法排序的唯一值的问题时,我的思绪都会立即进入 awk。这是我的看法。

Code

代码

#!/bin/bash

diff(){
  awk 'BEGIN{RS=ORS=" "}
       {NR==FNR?a[
$ ./diffArray.sh
key10 key7 key8 key9
]++:a[
ARR1=("key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10")
ARR2=("key1" "key2" "key3" "key4" "key5" "key6")

mapfile -t RESULT < \
    <(comm -23 \
        <(IFS=$'\n'; echo "${ARR1[*]}" | sort) \
        <(IFS=$'\n'; echo "${ARR2[*]}" | sort) \
    )

echo "${RESULT[@]}" # outputs "key10 key7 key8 key9"
]--} END{for(k in a)if(a[k])print k}' <(echo -n "${!1}") <(echo -n "${!2}") } Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" ) Array2=( "key1" "key2" "key3" "key4" "key5" "key6" ) Array3=($(diff Array1[@] Array2[@])) echo ${Array3[@]}

Output

输出

function array_diff {
    eval local ARR1=\(\"${[@]}\"\)
    eval local ARR2=\(\"${[@]}\"\)
    local IFS=$'\n'
    mapfile -t  < <(comm -23 <(echo "${ARR1[*]}" | sort) <(echo "${ARR2[*]}" | sort))
}

# usage:
array_diff RESULT ARR1 ARR2
echo "${RESULT[@]}" # outputs "key10 key7 key8 key9"

*Note**: Like other answers given, if there are duplicate keys in an array they will only be reported once; this may or may not be the behavior you are looking for. The awk code to handle that is messier and not as clean.

*注意**:与给出的其他答案一样,如果数组中有重复的键,它们只会被报告一次;这可能是也可能不是您正在寻找的行为。处理这个问题的 awk 代码更混乱,而且不那么干净。

回答by Alex Offshore

Having ARR1and ARR2as arguments, use commto do the job and mapfileto put it back into RESULTarray:

ARR1ARR2作为参数,用于comm完成工作mapfile并将其放回RESULT数组:

declare -A temp    # associative array
for element in "${Array1[@]}" "${Array2[@]}"
do
    ((temp[$element]++))
done
for element in "${!temp[@]}"
do
    if (( ${temp[$element]} > 1 ))
    then
        unset "temp[$element]"
    fi
done
Array3=(${!temp[@]})    # retrieve the keys as values

Note that result may not meet source order.

请注意,结果可能不符合源顺序。

Bonus aka "that's what you are here for":

奖金又名“这就是你来这里的目的”:

declare -A temp1 temp2    # associative arrays
for element in "${Array1[@]}"
do
    ((temp1[$element]++))
done

for element in "${Array2[@]}"
do
    ((temp2[$element]++))
done

for element in "${!temp1[@]}"
do
    if (( ${temp1[$element]} >= 1 && ${temp2[$element]-0} >= 1 ))
    then
        unset "temp1[$element]" "temp2[$element]"
    fi
done
Array3=(${!temp1[@]} ${!temp2[@]})

Using those tricky evals is the least worst option among others dealing with array parameters passing in bash.

使用那些棘手的 eval 是处理传入 bash 的数组参数的最不糟糕的选择。

Also, take a look at commmanpage; based on this code it's very easy to implement, for example, array_intersect: just use -12 as comm options.

另外,看看comm联机帮助页;基于此代码,它很容易实现,例如array_intersect:只需使用 -12 作为通信选项。

回答by Paused until further notice.

In Bash 4:

在 Bash 4 中:

list1=( 1 2 3 4   6 7 8 9 10 11 12)
list2=( 1 2 3   5 6   8 9    11 )

l2=" ${list2[*]} "                    # add framing blanks
for item in ${list1[@]}; do
  if ! [[ $l2 =~ " $item " ]] ; then    # use $item as regexp
    result+=($item)
  fi
done
echo  ${result[@]}:

Edit:

编辑:

ephemientpointed out a potentially serious bug. If an element exists in one array with one or more duplicates and doesn't exist at all in the other array, it will be incorrectly removed from the list of unique values. The version below attempts to handle that situation.

ephemient指出了一个潜在的严重错误。如果一个元素存在于一个具有一个或多个重复项的数组中,而在另一个数组中根本不存在,则它将被错误地从唯一值列表中删除。下面的版本试图处理这种情况。

$ bash diff-arrays.sh 
4 7 10 12

回答by Denis Gois

It is possible to use regex too (based on another answer: Array intersection in bash):

也可以使用正则表达式(基于另一个答案:bash 中的数组交集):

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=( "key1" "key2" "key3" "key4" "key5" "key6" "key11" )
a1=${Array1[@]};a2=${Array2[@]}; a3=${Array3[@]}
diff(){
    a1=""
    a2=""
    awk -va1="$a1" -va2="$a2" '
     BEGIN{
       m= split(a1, A1," ")
       n= split(a2, t," ")
       for(i=1;i<=n;i++) { A2[t[i]] }
       for (i=1;i<=m;i++){
            if( ! (A1[i] in A2)  ){
                printf A1[i]" "
            }
        }
    }'
}
Array4=( $(diff "$a1" "$a2") )  #compare a1 against a2
echo "Array4: ${Array4[@]}"
Array4=( $(diff "$a3" "$a1") )  #compare a3 against a1
echo "Array4: ${Array4[@]}"

Result:

结果:

$ ./shell.sh
Array4: key7 key8 key9 key10
Array4: key11

回答by ghostdog74

##代码##

output

输出

##代码##