bash bash中的数组交集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7870230/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 00:59:18  来源:igfitidea点击:

Array intersection in bash

bash

提问by dabest1

How do you compare two arrays in bash to find all intersecting values?

你如何比较 bash 中的两个数组以找到所有相交的值?

Let's say:
array1 contains values 1 and 2
array2 contains values 2 and 3

假设:
array1 包含值 1 和 2
array2 包含值 2 和 3

I should get back 2 as a result.

结果我应该拿回2。

My own answer, which I can't post yet due to small reputation:

我自己的答案,由于名气不大,我还不能发布:

for item1 in $array1; do
    for item2 in $array2; do
        if [[ $item1 = $item2 ]]; then
            result=$result" "$item1
        fi
    done
done

I'm looking for alternate solutions as well.

我也在寻找替代解决方案。

回答by Fritz G. Mehner

The elements of list 1 are used as regular expression looked up in list2 (expressed as string: ${list2[*]} ):

列表 1 的元素用作在列表 2 中查找的正则表达式(表示为字符串: ${list2[*]} ):

list1=( 1 2 3 4   6 7 8 9 10 11 12)
list2=( 1 2 3   5 6   8 9    11 )

l2=" ${list2[*]} "                    # add framing blanks
for item in ${list1[@]}; do
  if [[ $l2 =~ " $item " ]] ; then    # use $item as regexp
    result+=($item)
  fi
done
echo  ${result[@]}

The result is

结果是

1 2 3 6 8 9 11

回答by nhed

Taking @Raihan's answer and making it work with non-files (though FDs are created) I know it's a bit of a cheat but seemed like good alternative

接受@Raihan 的回答并使其适用于非文件(尽管创建了 FD)我知道这有点作弊,但似乎是不错的选择

Side effect is that the output array will be lexicographically sorted, hope thats okay (also don't kno what type of data you have, so I just tested with numbers, there may be additional work needed if you have strings with special chars etc)

副作用是输出数组将按字典顺序排序,希望没问题(也不知道你有什么类型的数据,所以我只是用数字测试,如果你有带有特殊字符的字符串等,可能需要额外的工作)

result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort)  <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))

Testing:

测试:

$ array1=(1 17 33 99 109)
$ array2=(1 2 17 31 98 109)

result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort)  <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))

$ echo ${result[@]}
1 109 17

p.s. I'm sure there was a way to get the array to out one value per line w/o the forloop, I just forget it (IFS?)

ps我确定有一种方法可以让数组在没有for循环的情况下每行输出一个值,我只是忘记了(IFS?)

回答by Raihan

If it was two files (instead of arrays) you were looking for intersecting lines, you could use the commcommand.

如果您正在寻找相交线的两个文件(而不是数组),则可以使用该comm命令。

$ comm -12 file1 file2

回答by ruakh

Your answer won't work, for two reasons:

您的回答无效,原因有二:

  • $array1just expands to the first element of array1. (At least, in my installed version of Bash that's how it works. That doesn't seem to be a documented behavior, so it may be a version-dependent quirk.)
  • After the first element gets added to result, resultwill then contain a space, so the next run of result=$result" "$item1will misbehave horribly. (Instead of appending to result, it will run the command consisting of the first two items, with the environment variable resultbeing set to the empty string.) Correction:Turns out, I was wrong about this one: word-splitting doesn't take place inside assignments. (See comments below.)
  • $array1只是扩展到 的第一个元素array1。(至少,在我安装的 Bash 版本中,它是这样工作的。这似乎不是记录在案的行为,所以它可能是一个依赖于版本的怪癖。)
  • 在第一个元素被添加到 之后resultresult将包含一个空格,所以下一次运行的result=$result" "$item1将表现得非常糟糕。(而不是附加到result,它将运行由前两项组成的命令,环境变量result被设置为空字符串。)更正:结果,我错了:不发生分词内部作业。(见下面的评论。)

What you want is this:

你想要的是这个:

result=()
for item1 in "${array1[@]}"; do
    for item2 in "${array2[@]}"; do
        if [[ $item1 = $item2 ]]; then
            result+=("$item1")
        fi
    done
done

回答by ruakh

Now that I understand what you mean by "array", I think -- first of all -- that you should consider using actual Bash arrays. They're much more flexible, in that (for example) array elements can contain whitespace, and you can avoid the risk that *and ?will trigger filename expansion.

现在我明白了“数组”的意思,我认为——首先——你应该考虑使用实际的 Bash 数组。他们更灵活,在(例如)数组元素可以包含空格,你能避免这种风险*,并?会触发文件名扩展。

But if you prefer to use your existing approach of whitespace-delimited strings, then I agree with RHT's suggestion to use Perl:

但是,如果您更喜欢使用现有的以空格分隔的字符串方法,那么我同意 RHT 使用 Perl 的建议:

result=$(perl -e 'my %array2 = map +($_ => 1), split /\s+/, $ARGV[1];
                  print join " ", grep $array2{$_}, split /\s+/, $ARGV[0]
                 ' "$array1" "$array2")

(The line-breaks are just for readability; you can get rid of them if you want.)

(换行符只是为了可读性;如果你愿意,你可以去掉它们。)

In the above Bash command, the embedded Perl program creates a hash named %array2containing the elements of the second array, and then it prints any elements of the first array that exist in %array2.

在上面的 Bash 命令中,嵌入式 Perl 程序创建一个名为的散列,%array2其中包含第二个数组的元素,然后打印第一个数组中存在于%array2.

This will behave slightly differently from your code in how it handles duplicate values in the second array; in your code, if array1contains xtwice and array2contains xthree times, then resultwill contain xsix times, whereas in my code, resultwill contain xonly twice. I don't know if that matters, since I don't know your exact requirements.

这将与您的代码在处理第二个数组中的重复值的方式上略有不同;在您的代码中,如果array1包含x两次并array2包含x三次,result则将包含x六次,而在我的代码中,resultx只包含两次。我不知道这是否重要,因为我不知道您的确切要求。