bash 中的嵌套关联数组

Question

提问by user001

Can one construct an associative array whose elements contain arrays in bash? For instance, suppose one has the following arrays:

可以构造一个关联数组，其元素包含 bash 中的数组吗？例如，假设有以下数组：

a=(a aa)
b=(b bb bbb)
c=(c cc ccc cccc)

Can one create an associate array to access these variables? For instance,

可以创建一个关联数组来访问这些变量吗？例如，

declare -A letters
letters[a]=$a
letters[b]=$b
letters[c]=$c

and then access individual elements by a command such as

然后通过命令访问单个元素，例如

letter=${letters[a]}
echo ${letter[1]}

This mock syntax for creating and accessing elements of the associate array does not work. Do valid expressions accomplishing the same goals exist?

这种用于创建和访问关联数组元素的模拟语法不起作用。是否存在实现相同目标的有效表达式？

Answer 1

采纳答案by sirosen

I think the more straightforward answer is "No, bash arrays cannot be nested." Anything that simulates nested arrays is actually just creating fancy mapping functions for the keyspace of the (single layered) arrays.

我认为更直接的答案是“不，bash 数组不能嵌套。” 任何模拟嵌套数组的东西实际上只是为（单层）数组的键空间创建奇特的映射函数。

Not that that's bad: it may be exactly what you want, but especially when you don't control the keys into your array, doing it properly becomes harder. Although I like the solution given by @konsolebox of using a delimiter, it ultimately falls over if your keyspace includes keys like "p|q". It does have a nice benefit in that you can operate transparently on your keys, as in array[abc|def]to look up the key defin array[abc], which is very clear and readable. Because it relies on the delimiter not appearing in the keys, this is only a good approach when you know what the keyspace looks like now and in all future uses of the code. This is only a safe assumption when you have strict control over the data.

并不是说那不好：它可能正是您想要的，但尤其是当您不控制数组中的键时，正确地执行它会变得更加困难。尽管我喜欢@konsolebox 提供的使用分隔符的解决方案，但如果您的键空间包含像"p|q". 它确实有，你可以在你的钥匙透明运行一个很好的好处，如array[abc|def]查找关键def的array[abc]，这是非常清晰可读。因为它依赖于未出现在键中的分隔符，所以只有当您知道键空间现在和将来所有代码使用时的样子时，这才是一个很好的方法。当您严格控制数据时，这只是一个安全的假设。

If you need any kind of robustness, I would recommend concatenating hashes of your array keys. This is a simple technique that is extremely likely to eliminate conflicts, although they are possible if you are operating on extremely carefully crafted data.

如果您需要任何类型的健壮性，我建议您连接数组键的哈希值。这是一种非常有可能消除冲突的简单技术，尽管如果您对精心制作的数据进行操作，它们是可能的。

To borrow a bit from how Git handles hashes, let's take the first 8 characters of the sha512sums of keys as our hashed keys. If you feel nervous about this, you can always use the whole sha512sum, since there are no known collisions for sha512. Using the whole checksum makes sure that you are safe, but it is a little bit more burdensome.

借用 Git 处理散列的方式，让我们将键的 sha512sums 的前 8 个字符作为我们的散列键。如果您对此感到紧张，您可以随时使用整个 sha512sum，因为 sha512 没有已知的冲突。使用整个校验和可以确保您是安全的，但它有点麻烦。

So, if I want the semantics of storing an element in array[abc][def]what I should do is store the value in array["$(keyhash "abc")$(keyhash "def")"]where keyhashlooks like this:

所以，如果我想在存储元素的语义array[abc][def]我应该做的是存储在价值array["$(keyhash "abc")$(keyhash "def")"]在那里keyhash像这个样子：

function keyhash () {
    echo "" | sha512sum | cut -c-8
}

You can then pull out the elements of the associative array using the same keyhashfunction. Funnily, there's a memoized version of keyhash you can write which uses an array to store the hashes, preventing extra calls to sha512sum, but it gets expensive in terms of memory if the script takes many keys:

然后，您可以使用相同的keyhash函数提取关联数组的元素。有趣的是，您可以编写一个记忆化的 keyhash 版本，它使用一个数组来存储哈希值，防止对 sha512sum 的额外调用，但如果脚本采用多个键，它会在内存方面变得昂贵：

declare -A keyhash_array
function keyhash () {
    if [ "${keyhash_array[""]}" == "" ];
    then
        keyhash_array[""]="$(echo "" | sha512sum | cut -c-8)"
    fi
    echo "${keyhash_array[""]}"
}

A length inspection on a given key tells me how many layers deep it looks into the array, since that's just len/8, and I can see the subkeys for a "nested array" by listing keys and trimming those that have the correct prefix. So if I want all of the keys in array[abc], what I should really do is this:

对给定键的长度检查告诉我它查看数组的深度有多少层，因为这只是len/8，我可以通过列出键并修剪具有正确前缀的键来查看“嵌套数组”的子键。所以如果我想要所有的键array[abc]，我真正应该做的是：

for key in "${!array[@]}"
do
    if [[ "$key" == "$(keyhash "abc")"* ]];
    then
        # do stuff with "$key" since it's a key directly into the array
        :
    fi
done

Interestingly, this also means that first level keys are valid and can contain values. So, array["$(keyhash "abc")"]is completely valid, which means this "nested array" construction can have some interesting semantics.

有趣的是，这也意味着第一级键是有效的并且可以包含值。所以，array["$(keyhash "abc")"]是完全有效的，这意味着这个“嵌套数组”结构可以有一些有趣的语义。

In one form or another, any solution for nested arrays in Bash is pulling this exact same trick: produce a (hopefully injective) mapping function f(key,subkey)which produces strings that they can be used as array keys. This can always be applied further as f(f(key,subkey),subsubkey)or, in the case of the keyhashfunction above, I prefer to define f(key)and apply to subkeys as concat(f(key),f(subkey))and concat(f(key),f(subkey),f(subsubkey)). In combination with memoization for f, this is a lot more efficient. In the case of the delimiter solution, nested applications of fare necessary, of course.

以一种或另一种形式，Bash 中嵌套数组的任何解决方案都采用了完全相同的技巧：生成一个（希望是内射的）映射函数f(key,subkey)，该函数生成可用作数组键的字符串。这总是可以进一步应用，f(f(key,subkey),subsubkey)或者，在上述keyhash函数的情况下，我更喜欢f(key)将子键定义和应用为concat(f(key),f(subkey))and concat(f(key),f(subkey),f(subsubkey))。与记忆化相结合，f效率更高。在分隔符解决方案的情况下f，当然需要嵌套应用程序。

With that known, the best solution that I know of is to take a short hash of the keyand subkeyvalues.

知道了这一点，我所知道的最佳解决方案是对key和subkey值进行简短的散列。

I recognize that there's a general dislike for answers of the type "You're doing it wrong, use this other tool!" but associative arrays in bash are messy on numerous levels, and run you into trouble when you try to port code to a platform that (for some silly reason or another) doesn't have bash on it, or has an ancient (pre-4.x) version. If you are willing to look into another language for your scripting needs, I'd recommend picking up some awk.

我认识到人们普遍不喜欢“你做错了，使用其他工具！”类型的答案。但是 bash 中的关联数组在许多层面上都是混乱的，当您尝试将代码移植到一个平台上时会遇到麻烦（出于某种愚蠢的原因）没有 bash 或者有一个古老的（pre-4 .x) 版本。如果您愿意研究另一种语言来满足您的脚本需求，我建议您学习一些 awk。

It provides the simplicity of shell scripting with the flexibility that comes with more feature rich languages. There are a few reasons I think this is a good idea:

它提供了 shell 脚本的简单性和更多功能丰富的语言带来的灵活性。我认为这是个好主意有几个原因：

GNU awk (the most prevalent variant) has fully fledged associative arrays which can nest properly, with the intuitive syntax of array[key][subkey]
You can embed awk in shell scripts, so you still get the tools of the shell when you really need them
awk is stupidly simple at times, which puts it in stark contrast with other shell replacement languages like Perl and Python

GNU awk（最流行的变体）具有完全成熟的关联数组，可以正确嵌套，使用直观的语法 array[key][subkey]
您可以将 awk 嵌入到 shell 脚本中，因此您仍然可以在真正需要时获得 shell 的工具
awk 有时非常简单，这使它与 Perl 和 Python 等其他 shell 替换语言形成鲜明对比

That's not to say that awk is without its failings. It can be hard to understand when you're first learning it because it's heavily oriented towards stream processing (a lot like sed), but it's a great tool for a lot of tasks that are just barely outside of the scope of the shell.

这并不是说 awk 没有缺点。当你第一次学习它时可能很难理解，因为它主要面向流处理（很像 sed），但它是一个很好的工具，适用于许多几乎不属于 shell 范围的任务。

Note that above I said that "GNU awk" (gawk) has multidimensional arrays. Other awks actually do the trick of separating keys with a well-defined separator, SUBSEP. You can do this yourself, as with the array[a|b]solution in bash, but nawk has this feature builtin if you do array[key,subkey]. It's still a bit more fluid and clear than bash's array syntax.

请注意，上面我说“GNU awk”（gawk）具有多维数组。其他 awk 实际上使用定义明确的分隔符来分隔键，SUBSEP. 您可以自己执行此操作，就像使用array[a|b]bash 中的解决方案一样，但是如果您执行array[key,subkey]. 它仍然比 bash 的数组语法更加流畅和清晰。

Answer 2

回答by konsolebox

This is the best non-hacky way to do it but you're only limited to accessing single elements. Using indirect variable expansion references is another but you'd still have to store every element set on an array. If you want to have some form of like anonymous arrays, you'd need to have a random parameter name generator. If you don't use a random name for an array, then there's no sense referencing it on associative array. And of course I wouldn't like using external tools to generate random anonymous variable names. It would be funny whoever does it.

这是最好的非 hacky 方法，但您只能访问单个元素。使用间接变量扩展引用是另一种方法，但您仍然必须将每个元素集存储在数组中。如果您想要某种形式的匿名数组，则需要有一个随机参数名称生成器。如果不为数组使用随机名称，则在关联数组上引用它是没有意义的。当然，我不喜欢使用外部工具来生成随机匿名变量名称。不管是谁做的都会很有趣。

#!/bin/bash

a=(a aa)
b=(b bb bbb)
c=(c cc ccc cccc)

declare -A letters

function store_array {
    local var= base_key= values=("${@:3}")
    for i in "${!values[@]}"; do
        eval "[$base_key|$i]=${values[i]}"
    done
}

store_array letters a "${a[@]}"
store_array letters b "${b[@]}"
store_array letters c "${c[@]}"

echo "${letters[a|1]}"

Answer 3

回答by eel ghEEz

For those stumbling on this question when looking for ways to pass command line arguments within a command line argument, an encoding such as JSON could turn useful, as long as the consumer agrees to use the encoding.

对于那些在寻找在命令行参数中传递命令行参数的方法时遇到这个问题的人来说，只要消费者同意使用这种编码，JSON 之类的编码就会变得有用。

# Usage: nestenc='{"a": ["a", "aa "],
"b": ["b", "bb", "b bb"],
"c d": ["c", "cc ", " ccc", "cc cc"]
}'
index="c d"
letter=()
while read -r line; do
  letter+=("${line}")
done < <(jq -r ".\"${index}\"[]" <<< "${nestenc}") 
for c in "${letter[@]}" ; do echo "<<${c}>>" ; done
 --toolargs '["arg 1", "arg 2"]' --otheropt
toolargs=""
v=()
while read -r line; do v+=("${line}"); done < <(jq -r '.[]' <<< "${toolargs}")
sometool "${v[@]}"

<<c>>
<<cc>>
<<ccc>>
<<cc cc>>

The output follows.

输出如下。

##代码##

bash 中的嵌套关联数组

提问by user001

采纳答案by sirosen

回答by konsolebox

回答by eel ghEEz

相关推荐

最近更新

标签

bash 中的嵌套关​​联数组

提问by user001

采纳答案by sirosen

回答by konsolebox

回答by eel ghEEz

相关推荐

bash 如何让猫开始新的一行

如何在带有变量的 bash 脚本中使用“head”？

bash 我如何自制安装 gradle 1.12？

在 Bash 中比较两个 IP 地址

相关推荐

最近更新

标签

bash 中的嵌套关联数组