如何在 Bash 中定义哈希表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1494178/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 18:33:51  来源:igfitidea点击:

How to define hash tables in Bash?

bashdictionaryhashtableassociative-array

提问by Sridhar Ratnakumar

What is the equivalent of Python dictionariesbut in Bash (should work across OS X and Linux).

什么是Python 字典的等价物,但在 Bash 中(应该在 OS X 和 Linux 上工作)。

回答by lhunath

Bash 4

重击 4

Bash 4 natively supports this feature. Make sure your script's hashbang is #!/usr/bin/env bashor #!/bin/bashso you don't end up using sh. Make sure you're either executing your script directly, or execute scriptwith bash script. (Not actually executing a Bash script with Bash doeshappen, and will be reallyconfusing!)

Bash 4 本身就支持这个特性。确保你的脚本hashbang是#!/usr/bin/env bash或者#!/bin/bash让你不使用结束sh。确保您要么直接执行脚本,要么script使用bash script. (实际上并没有使用 Bash 执行 Bash 脚本确实会发生,而且会非常混乱!)

You declare an associative array by doing:

您可以通过执行以下操作来声明关联数组:

declare -A animals

You can fill it up with elements using the normal array assignment operator. For example, if you want to have a map of animal[sound(key)] = animal(value):

您可以使用普通的数组赋值运算符用元素填充它。例如,如果您想要一张地图animal[sound(key)] = animal(value)

animals=( ["moo"]="cow" ["woof"]="dog")

Or merge them:

或者合并它们:

declare -A animals=( ["moo"]="cow" ["woof"]="dog")

Then use them just like normal arrays. Use

然后像普通数组一样使用它们。用

  • animals['key']='value'to set value

  • "${animals[@]}"to expand the values

  • "${!animals[@]}"(notice the !) to expand the keys

  • animals['key']='value'设定值

  • "${animals[@]}"扩大价值

  • "${!animals[@]}"(注意!) 展开键

Don't forget to quote them:

不要忘记引用它们:

echo "${animals[moo]}"
for sound in "${!animals[@]}"; do echo "$sound - ${animals[$sound]}"; done

Bash 3

重击 3

Before bash 4, you don't have associative arrays. Do not use evalto emulate them. Avoid evallike the plague, because it isthe plague of shell scripting. The most important reason is that evaltreats your data as executable code (there are many other reasons too).

在 bash 4 之前,您没有关联数组。 不要eval用来模仿它们。避免eval像瘟疫一样,因为它shell 脚本的瘟疫。最重要的原因是eval将您的数据视为可执行代码(还有许多其他原因)。

First and foremost: Consider upgrading to bash 4. This will make the whole process much easier for you.

首先也是最重要的:考虑升级到 bash 4。这将使整个过程对你来说更容易。

If there's a reason you can't upgrade, declareis a far safer option. It does not evaluate data as bash code like evaldoes, and as such does not allow arbitrary code injection quite so easily.

如果有不能升级的原因,这declare是一个更安全的选择。它不像 bash 代码那样评估数据,eval因此不允许任意代码注入如此容易。

Let's prepare the answer by introducing the concepts:

让我们通过引入概念来准备答案:

First, indirection.

第一,间接。

$ animals_moo=cow; sound=moo; i="animals_$sound"; echo "${!i}"
cow

Secondly, declare:

其次,declare

$ sound=moo; animal=cow; declare "animals_$sound=$animal"; echo "$animals_moo"
cow

Bring them together:

把它们放在一起:

# Set a value:
declare "array_$index=$value"

# Get a value:
arrayGet() { 
    local array= index=
    local i="${array}_$index"
    printf '%s' "${!i}"
}

Let's use it:

让我们使用它:

$ sound=moo
$ animal=cow
$ declare "animals_$sound=$animal"
$ arrayGet animals "$sound"
cow

Note: declarecannot be put in a function. Any use of declareinside a bash function turns the variable it creates localto the scope of that function, meaning we can't access or modify global arrays with it. (In bash 4 you can use declare -g to declare global variables - but in bash 4, you can use associative arrays in the first place, avoiding this workaround.)

注意:declare不能放在函数中。declare在 bash 函数内部的任何使用都会将它创建的变量变为该函数范围内的局部变量,这意味着我们无法使用它访问或修改全局数组。(在 bash 4 中,您可以使用 declare -g 来声明全局变量 - 但在 bash 4 中,您可以首先使用关联数组,从而避免这种变通方法。)

Summary:

概括:

  • Upgrade to bash 4 and use declare -Afor associative arrays.
  • Use the declareoption if you can't upgrade.
  • Consider using awkinstead and avoid the issue altogether.
  • 升级到 bash 4 并declare -A用于关联数组。
  • declare如果您无法升级,请使用该选项。
  • 考虑awk改用并完全避免这个问题。

回答by Bubnoff

There's parameter substitution, though it may be un-PC as well ...like indirection.

有参数替换,尽管它也可能是非 PC 的……比如间接。

#!/bin/bash

# Array pretending to be a Pythonic dictionary
ARRAY=( "cow:moo"
        "dinosaur:roar"
        "bird:chirp"
        "bash:rock" )

for animal in "${ARRAY[@]}" ; do
    KEY="${animal%%:*}"
    VALUE="${animal##*:}"
    printf "%s likes to %s.\n" "$KEY" "$VALUE"
done

printf "%s is an extinct animal which likes to %s\n" "${ARRAY[1]%%:*}" "${ARRAY[1]##*:}"

The BASH 4 way is better of course, but if you need a hack ...only a hack will do. You could search the array/hash with similar techniques.

BASH 4 方式当然更好,但是如果您需要 hack ......只有 hack 才能做到。您可以使用类似的技术搜索数组/哈希。

回答by aktivb

This is what I was looking for here:

这就是我在这里寻找的:

declare -A hashmap
hashmap["key"]="value"
hashmap["key2"]="value2"
echo "${hashmap["key"]}"
for key in ${!hashmap[@]}; do echo $key; done
for value in ${hashmap[@]}; do echo $value; done
echo hashmap has ${#hashmap[@]} elements

This did not work for me with bash 4.1.5:

对于 bash 4.1.5,这对我不起作用:

animals=( ["moo"]="cow" )

回答by Al P.

You can further modify the hput()/hget() interface so that you have named hashes as follows:

您可以进一步修改 hput()/hget() 接口,以便按如下方式命名哈希:

hput() {
    eval """"=''
}

hget() {
    eval echo '${'""'#hash}'
}

and then

进而

hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid
echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`

This lets you define other maps that don't conflict (e.g., 'rcapitals' which does country lookup by capital city). But, either way, I think you'll find that this is all pretty terrible, performance-wise.

这让您可以定义其他不冲突的地图(例如,按首都进行国家/地区查找的“rcapitals”)。但是,无论哪种方式,我认为您都会发现这在性能方面非常糟糕。

If you really want fast hash lookup, there's a terrible, terrible hack that actually works really well. It is this: write your key/values out to a temporary file, one-per line, then use 'grep "^$key"' to get them out, using pipes with cut or awk or sed or whatever to retrieve the values.

如果你真的想要快速的哈希查找,有一个可怕的、可怕的黑客,实际上效果很好。就是这样:将您的键/值写入一个临时文件,每行一个,然后使用 'grep "^$key"' 将它们取出,使用带有 cut 或 awk 或 sed 的管道或其他任何东西来检索值。

Like I said, it sounds terrible, and it sounds like it ought to be slow and do all sorts of unnecessary IO, but in practice it is very fast (disk cache is awesome, ain't it?), even for very large hash tables. You have to enforce key uniqueness yourself, etc. Even if you only have a few hundred entries, the output file/grep combo is going to be quite a bit faster - in my experience several times faster. It also eats less memory.

就像我说的,这听起来很糟糕,听起来它应该很慢并且执行各种不必要的 IO,但实际上它非常快(磁盘缓存很棒,不是吗?),即使对于非常大的哈希表。您必须自己强制执行密钥唯一性,等等。即使您只有几百个条目,输出文件/grep 组合也会快得多 - 根据我的经验,速度要快几倍。它也吃更少的内存。

Here's one way to do it:

这是一种方法:

hinit() {
    rm -f /tmp/hashmap.
}

hput() {
    echo " " >> /tmp/hashmap.
}

hget() {
    grep "^ " /tmp/hashmap. | awk '{ print  };'
}

hinit capitals
hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid

echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`

回答by lovasoa

Just use the file system

只需使用文件系统

The file system is a tree structure that can be used as a hash map. Your hash table will be a temporary directory, your keys will be filenames, and your values will be file contents. The advantage is that it can handle huge hashmaps, and doesn't require a specific shell.

文件系统是一个树结构,可以用作哈希映射。您的哈希表将是一个临时目录,您的键将是文件名,而您的值将是文件内容。优点是它可以处理巨大的hashmap,并且不需要特定的shell。

Hashtable creation

哈希表创建

hashtable=$(mktemp -d)

hashtable=$(mktemp -d)

Add an element

添加元素

echo $value > $hashtable/$key

echo $value > $hashtable/$key

Read an element

读取一个元素

value=$(< $hashtable/$key)

value=$(< $hashtable/$key)

Performance

表现

Of course, its slow, but not thatslow. I tested it on my machine, with an SSD and btrfs, and it does around 3000 element read/write per second.

当然,它很慢,但没有那么慢。我在我的机器上用 SSD 和btrfs对其进行了测试,它每秒进行大约3000 个元素读/写

回答by DigitalRoss

hput () {
  eval hash""=''
}

hget () {
  eval echo '${hash'""'#hash}'
}
hput France Paris
hput Netherlands Amsterdam
hput Spain Madrid
echo `hget France` and `hget Netherlands` and `hget Spain`


$ sh hash.sh
Paris and Amsterdam and Madrid

回答by AsymLabs

Consider a solution using the bash builtin readas illustrated within the code snippet from a ufw firewall script that follows. This approach has the advantage of using as many delimited field sets (not just 2) as are desired. We have used the |delimiter because port range specifiers may require a colon, ie 6001:6010.

考虑使用 bash 内置读取的解决方案,如下面的 ufw 防火墙脚本的代码片段所示。这种方法的优点是可以根据需要使用尽可能多的定界字段集(不仅仅是 2 个)。我们使用了| 分隔符,因为端口范围说明符可能需要冒号,即6001:6010

#!/usr/bin/env bash

readonly connections=(       
                            '192.168.1.4/24|tcp|22'
                            '192.168.1.4/24|tcp|53'
                            '192.168.1.4/24|tcp|80'
                            '192.168.1.4/24|tcp|139'
                            '192.168.1.4/24|tcp|443'
                            '192.168.1.4/24|tcp|445'
                            '192.168.1.4/24|tcp|631'
                            '192.168.1.4/24|tcp|5901'
                            '192.168.1.4/24|tcp|6566'
)

function set_connections(){
    local range proto port
    for fields in ${connections[@]}
    do
            IFS=$'|' read -r range proto port <<< "$fields"
            ufw allow from "$range" proto "$proto" to any port "$port"
    done
}

set_connections

回答by marco

I agree with @lhunath and others that the associative array are the way to go with Bash 4. If you are stuck to Bash 3 (OSX, old distros that you cannot update) you can use also expr, which should be everywhere, a string and regular expressions. I like it especially when the dictionary is not too big.

我同意@lhunath 和其他人的观点,即关联数组是 Bash 4 的方式。如果您坚持使用 Bash 3(OSX,您无法更新的旧发行版),您还可以使用 expr,它应该无处不在,一个字符串和正则表达式。我喜欢它,尤其是当字典不太大时。

  1. Choose 2 separators that you will not use in keys and values (e.g. ',' and ':' )
  2. Write your map as a string (note the separator ',' also at beginning and end)

    animals=",moo:cow,woof:dog,"
    
  3. Use a regex to extract the values

    get_animal {
        echo "$(expr "$animals" : ".*,:\([^,]*\),.*")"
    }
    
  4. Split the string to list the items

    get_animal_items {
        arr=$(echo "${animals:1:${#animals}-2}" | tr "," "\n")
        for i in $arr
        do
            value="${i##*:}"
            key="${i%%:*}"
            echo "${value} likes to $key"
        done
    }
    
  1. 选择 2 个您不会在键和值中使用的分隔符(例如 ',' 和 ':' )
  2. 将您的地图写为字符串(注意分隔符 ',' 也在开头和结尾处)

    animals=",moo:cow,woof:dog,"
    
  3. 使用正则表达式提取值

    get_animal {
        echo "$(expr "$animals" : ".*,:\([^,]*\),.*")"
    }
    
  4. 拆分字符串以列出项目

    get_animal_items {
        arr=$(echo "${animals:1:${#animals}-2}" | tr "," "\n")
        for i in $arr
        do
            value="${i##*:}"
            key="${i%%:*}"
            echo "${value} likes to $key"
        done
    }
    

Now you can use it:

现在你可以使用它:

$ animal = get_animal "moo"
cow
$ get_animal_items
cow likes to moo
dog likes to woof

回答by Cole Stanfield

I really liked Al P's answer but wanted uniqueness enforced cheaply so I took it one step further - use a directory. There are some obvious limitations (directory file limits, invalid file names) but it should work for most cases.

我真的很喜欢 Al P 的答案,但想要廉价地强制执行唯一性,所以我更进一步 - 使用目录。有一些明显的限制(目录文件限制、无效文件名),但它应该适用于大多数情况。

hinit() {
    rm -rf /tmp/hashmap.
    mkdir -p /tmp/hashmap.
}

hput() {
    printf "" > /tmp/hashmap./
}

hget() {
    cat /tmp/hashmap./
}

hkeys() {
    ls -1 /tmp/hashmap.
}

hdestroy() {
    rm -rf /tmp/hashmap.
}

hinit ids

for (( i = 0; i < 10000; i++ )); do
    hput ids "key$i" "value$i"
done

for (( i = 0; i < 10000; i++ )); do
    printf '%s\n' $(hget ids "key$i") > /dev/null
done

hdestroy ids

It also performs a tad bit better in my tests.

它在我的测试中也表现得更好一点。

$ time bash hash.sh 
real    0m46.500s
user    0m16.767s
sys     0m51.473s

$ time bash dirhash.sh 
real    0m35.875s
user    0m8.002s
sys     0m24.666s

Just thought I'd pitch in. Cheers!

只是想我会投球。干杯!

Edit: Adding hdestroy()

编辑:添加 hdestroy()

回答by jrichard

Two things, you can use memory instead of /tmp in any kernel 2.6 by using /dev/shm (Redhat) other distros may vary. Also hget can be reimplemented using read as follows:

两件事,您可以通过使用 /dev/shm (Redhat) 在任何内核 2.6 中使用内存而不是 /tmp 其他发行版可能会有所不同。也可以使用 read 重新实现 hget,如下所示:

function hget {

  while read key idx
  do
    if [ $key =  ]
    then
      echo $idx
      return
    fi
  done < /dev/shm/hashmap.
}

In addition by assuming that all keys are unique, the return short circuits the read loop and prevents having to read through all entries. If your implementation can have duplicate keys, then simply leave out the return. This saves the expense of reading and forking both grep and awk. Using /dev/shm for both implementations yielded the following using time hget on a 3 entry hash searching for the last entry :

此外,假设所有键都是唯一的,返回短路读取循环并防止必须通读所有条目。如果您的实现可以有重复的键,那么只需省略返回。这节省了读取和分叉 grep 和 awk 的费用。对这两个实现使用 /dev/shm 产生以下使用 time hget 在 3 个条目哈希上搜索最后一个条目的结果:

Grep/Awk:

格雷普/AWK:

hget() {
    grep "^ " /dev/shm/hashmap. | awk '{ print  };'
}

$ time echo $(hget FD oracle)
3

real    0m0.011s
user    0m0.002s
sys     0m0.013s

Read/echo:

阅读/回声:

$ time echo $(hget FD oracle)
3

real    0m0.004s
user    0m0.000s
sys     0m0.004s

on multiple invocations I never saw less then a 50% improvement. This can all be attributed to fork over head, due to the use of /dev/shm.

在多次调用中,我从未看到低于 50% 的改进。由于使用了/dev/shm.