bash bash中的数组交集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7870230/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Array intersection in bash
提问by dabest1
How do you compare two arrays in bash to find all intersecting values?
你如何比较 bash 中的两个数组以找到所有相交的值?
Let's say:
array1 contains values 1 and 2
array2 contains values 2 and 3
假设:
array1 包含值 1 和 2
array2 包含值 2 和 3
I should get back 2 as a result.
结果我应该拿回2。
My own answer, which I can't post yet due to small reputation:
我自己的答案,由于名气不大,我还不能发布:
for item1 in $array1; do
for item2 in $array2; do
if [[ $item1 = $item2 ]]; then
result=$result" "$item1
fi
done
done
I'm looking for alternate solutions as well.
我也在寻找替代解决方案。
回答by Fritz G. Mehner
The elements of list 1 are used as regular expression looked up in list2 (expressed as string: ${list2[*]} ):
列表 1 的元素用作在列表 2 中查找的正则表达式(表示为字符串: ${list2[*]} ):
list1=( 1 2 3 4 6 7 8 9 10 11 12)
list2=( 1 2 3 5 6 8 9 11 )
l2=" ${list2[*]} " # add framing blanks
for item in ${list1[@]}; do
if [[ $l2 =~ " $item " ]] ; then # use $item as regexp
result+=($item)
fi
done
echo ${result[@]}
The result is
结果是
1 2 3 6 8 9 11
回答by nhed
Taking @Raihan's answer and making it work with non-files (though FDs are created) I know it's a bit of a cheat but seemed like good alternative
接受@Raihan 的回答并使其适用于非文件(尽管创建了 FD)我知道这有点作弊,但似乎是不错的选择
Side effect is that the output array will be lexicographically sorted, hope thats okay (also don't kno what type of data you have, so I just tested with numbers, there may be additional work needed if you have strings with special chars etc)
副作用是输出数组将按字典顺序排序,希望没问题(也不知道你有什么类型的数据,所以我只是用数字测试,如果你有带有特殊字符的字符串等,可能需要额外的工作)
result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort) <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))
Testing:
测试:
$ array1=(1 17 33 99 109)
$ array2=(1 2 17 31 98 109)
result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort) <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))
$ echo ${result[@]}
1 109 17
p.s. I'm sure there was a way to get the array to out one value per line w/o the forloop, I just forget it (IFS?)
ps我确定有一种方法可以让数组在没有for循环的情况下每行输出一个值,我只是忘记了(IFS?)
回答by Raihan
If it was two files (instead of arrays) you were looking for intersecting lines, you could use the commcommand.
如果您正在寻找相交线的两个文件(而不是数组),则可以使用该comm命令。
$ comm -12 file1 file2
回答by ruakh
Your answer won't work, for two reasons:
您的回答无效,原因有二:
$array1just expands to the first element ofarray1. (At least, in my installed version of Bash that's how it works. That doesn't seem to be a documented behavior, so it may be a version-dependent quirk.)- After the first element gets added to
result,resultwill then contain a space, so the next run ofresult=$result" "$item1will misbehave horribly. (Instead of appending toresult, it will run the command consisting of the first two items, with the environment variableresultbeing set to the empty string.) Correction:Turns out, I was wrong about this one: word-splitting doesn't take place inside assignments. (See comments below.)
$array1只是扩展到 的第一个元素array1。(至少,在我安装的 Bash 版本中,它是这样工作的。这似乎不是记录在案的行为,所以它可能是一个依赖于版本的怪癖。)- 在第一个元素被添加到 之后
result,result将包含一个空格,所以下一次运行的result=$result" "$item1将表现得非常糟糕。(而不是附加到result,它将运行由前两项组成的命令,环境变量result被设置为空字符串。)更正:结果,我错了:不发生分词内部作业。(见下面的评论。)
What you want is this:
你想要的是这个:
result=()
for item1 in "${array1[@]}"; do
for item2 in "${array2[@]}"; do
if [[ $item1 = $item2 ]]; then
result+=("$item1")
fi
done
done
回答by ruakh
Now that I understand what you mean by "array", I think -- first of all -- that you should consider using actual Bash arrays. They're much more flexible, in that (for example) array elements can contain whitespace, and you can avoid the risk that *and ?will trigger filename expansion.
现在我明白了“数组”的意思,我认为——首先——你应该考虑使用实际的 Bash 数组。他们更灵活,在(例如)数组元素可以包含空格,你能避免这种风险*,并?会触发文件名扩展。
But if you prefer to use your existing approach of whitespace-delimited strings, then I agree with RHT's suggestion to use Perl:
但是,如果您更喜欢使用现有的以空格分隔的字符串方法,那么我同意 RHT 使用 Perl 的建议:
result=$(perl -e 'my %array2 = map +($_ => 1), split /\s+/, $ARGV[1];
print join " ", grep $array2{$_}, split /\s+/, $ARGV[0]
' "$array1" "$array2")
(The line-breaks are just for readability; you can get rid of them if you want.)
(换行符只是为了可读性;如果你愿意,你可以去掉它们。)
In the above Bash command, the embedded Perl program creates a hash named %array2containing the elements of the second array, and then it prints any elements of the first array that exist in %array2.
在上面的 Bash 命令中,嵌入式 Perl 程序创建一个名为的散列,%array2其中包含第二个数组的元素,然后打印第一个数组中存在于%array2.
This will behave slightly differently from your code in how it handles duplicate values in the second array; in your code, if array1contains xtwice and array2contains xthree times, then resultwill contain xsix times, whereas in my code, resultwill contain xonly twice. I don't know if that matters, since I don't know your exact requirements.
这将与您的代码在处理第二个数组中的重复值的方式上略有不同;在您的代码中,如果array1包含x两次并array2包含x三次,result则将包含x六次,而在我的代码中,result将x只包含两次。我不知道这是否重要,因为我不知道您的确切要求。

