Linux 在bash中将字符串拆分为数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14630940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 18:53:01  来源:igfitidea点击:

Split string into array in bash

linuxbashshellunix

提问by user000001

I am looking for a way to split a string in bash over a delimiter string, and place the parts in an array.

我正在寻找一种将 bash 中的字符串拆分为分隔符字符串的方法,并将这些部分放在一个数组中。

Simple case:

简单案例:

#!/bin/bash
b="aaaaa/bbbbb/ddd/ffffff"
echo "simple string: $b"

IFS='/' b_split=($b)
echo ;
echo "split"
for i in ${b_split[@]}
do
    echo "------ new part ------"
    echo "$i"
done

Gives output

给出输出

simple string: aaaaa/bbbbb/ddd/ffffff

split
------ new part ------
aaaaa
------ new part ------
bbbbb
------ new part ------
ddd
------ new part ------
ffffff

More complex case:

更复杂的情况:

#!/bin/bash
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";

IFS='=======' c_split=($c) ;#    <----    LINE TO BE CHANGED 

for i in ${c_split[@]}
do
    echo "------ new part ------"
    echo "$i"
done

Gives output:

给出输出:

more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF

split
------ new part ------
AA
------ new part ------
A
B
------ new part ------
BB

------ new part ------

------ new part ------

------ new part ------

------ new part ------

------ new part ------

------ new part ------

------ new part ------

C
------ new part ------

------ new part ------
CC
DD
------ new part ------
D

------ new part ------

------ new part ------

------ new part ------

------ new part ------

------ new part ------

------ new part ------

------ new part ------

EEE
FF

I would like the second output to be like

我希望第二个输出像

------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF

I.e. to splitthe string on a sequence of characters, instead of one. How can I do this?

即在字符序列拆分字符串,而不是一个。我怎样才能做到这一点?

I am looking for an answer that would only modify this line in the second script:

我正在寻找一个只会修改第二个脚本中这一行的答案:

IFS='=======' c_split=($c) ;#    <----    LINE TO BE CHANGED 

采纳答案by F. Hauri

IFSdisambiguation

IFS消歧义

IFSmean Input Field Separators, as list of characters that could be used as separators.

IFS表示输入字段分隔符,如list of characters that could be used as separators.

By default, this is set to \t\n, meaning that any number (greater than zero) of space, tabulationand/ornewlinecould be oneseparator.

默认情况下,这设置为 \t\n,这意味着任何数量(大于零)的空格制表和/或换行符都可以是separator

So the string:

所以字符串:

 "    blah  foo=bar 
 baz  "

Leading and trailing separatorswould be ignored and this string will contain only 3 parts: blah, foo=barand baz.

前导和尾随分隔符将被忽略,该字符串将仅包含 3 个部分:blah,foo=barbaz

Splitting a string using IFSis possible if you know a valid field separator not used in your string.

IFS如果您知道字符串中未使用的有效字段分隔符,则可以使用拆分字符串。

OIFS="$IFS"
IFS='§'
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
c_split=(${c//=======/§})
IFS="$OIFS"
printf -- "------ new part ------\n%s\n" "${c_split[@]}"

------ new part ------
AA=A
B=BB

------ new part ------

C==CC
DD=D

------ new part ------

EEE
FF

But this work only while string do not contain §.

但这仅在字符串不包含§.

You could use another character, like IFS=$'\026';c_split=(${c//=======/$'\026'})but anyway this may involve furter bugs.

您可以使用另一个字符,IFS=$'\026';c_split=(${c//=======/$'\026'})但无论如何这可能涉及更多错误。

You could browse character maps for finding one who's not in your string:

您可以浏览字符映射以查找不在字符串中的字符:

myIfs=""
for i in {1..255};do
    printf -v char "$(printf "\\%03o" $i)"
        [ "$c" == "${c#*$char}" ] && myIfs="$char" && break
  done
if ! [ "$myIFS" ] ;then
    echo no split char found, could not do the job, sorry.
    exit 1
  fi

but I find this solution a little overkill.

但我觉得这个解决方案有点矫枉过正。

Splitting on spaces (or without modifying IFS)

在空格上拆分(或不修改 IFS)

Under bash, we could use this bashism:

bash 下,我们可以使用这种 bashism:

b="aaaaa/bbbbb/ddd/ffffff"
b_split=(${b//// })

In fact, this syntaxe ${varname//will initiate a translation (delimited by /) replacing all occurences of /by a space , beforeassigning it to an array b_split.

事实上,此语法${varname//将启动转换(以 分隔/),将其分配给数组之前,将所有出现/的替换为空格。b_split

Of course, this still use IFSand split array on spaces.

当然,这仍然IFS在空间上使用和拆分数组。

This is not the best way, but could work with specific cases.

这不是最好的方法,但可以适用于特定情况。

You could even drop unwanted spaces before splitting:

您甚至可以在拆分之前删除不需要的空格:

b='12 34 / 1 3 5 7 / ab'
b1=${b// }
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]}" ;echo
<12>, <34>, <1>, <3>, <5>, <7>, <ab>, 

or exchange thems...

或交换他们...

b1=${b// /§}
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]//§/ }" ;echo
<12 34 >, < 1 3 5 7 >, < ab>, 

Splitting line on strings:

分割线strings

So you have to notuse IFSfor your meaning, but bashdo have nice features:

所以你不能IFS你的意思,但bash确实有很好的特性:

#!/bin/bash

c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";

mySep='======='
while [ "$c" != "${c#*$mySep}" ];do
    echo "------ new part ------"
    echo "${c%%$mySep*}"
    c="${c#*$mySep}"
  done
echo "------ last part ------"
echo "$c"

Let see:

让我们看看:

more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF

split
------ new part ------
AA=A
B=BB

------ new part ------

C==CC
DD=D

------ last part ------

EEE
FF

Nota: Leading and trailing newlines are not deleted. If this is needed, you could:

注意:不会删除前导和尾随换行符。如果需要,您可以:

mySep=$'\n=======\n'

instead of simply =======.

而不是简单地=======

Or you could rewrite split loop for keeping explicitely this out:

或者您可以重写拆分循环以明确地保留这一点:

mySep=$'======='
while [ "$c" != "${c#*$mySep}" ];do
    echo "------ new part ------"
    part="${c%%$mySep*}"
    part="${part##$'\n'}"
    echo "${part%%$'\n'}"
    c="${c#*$mySep}"
  done
echo "------ last part ------"
c=${c##$'\n'}
echo "${c%%$'\n'}"

Any case, this match what SO question asked for (: and his sample :)

无论如何,这符合 SO 问题所要求的 (: 和他的样本 :)

------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF

Finaly creating an array

最后创建一个 array

#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";

mySep=$'======='
export -a c_split
while [ "$c" != "${c#*$mySep}" ];do
    part="${c%%$mySep*}"
    part="${part##$'\n'}"
    c_split+=("${part%%$'\n'}")
    c="${c#*$mySep}"
  done
c=${c##$'\n'}
c_split+=("${c%%$'\n'}")

for i in "${c_split[@]}"
do
    echo "------ new part ------"
    echo "$i"
done

Do this finely:

精细地做到这一点:

more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF

split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF

Some explanations:

一些解释:

  • export -a varto define varas an array and share them in childs
  • ${variablename%string*}, ${variablename%%string*}result in the left part of variablename, upto but without string. One %mean last occurence of stringand %%for all occurences. Full variablenameis returned is stringnot found.
  • ${variablename#*string}, do same in reverse way: return last part of variablenamefrom but without string. One #mean first occurenceand two ##man all occurences.
  • export -a var定义var为数组并在孩子享它们
  • ${variablename%string*},${variablename%%string*}导致variablename的左侧部分,直到但没有string。一个%意思是字符串的最后一个实例,并%%所有出现。返回完整变量名是未找到字符串
  • ${variablename#*string}, 以相反的方式做同样的事情:从但没有string返回variablename 的最后一部分。一个意思是第一次出现,两个人全部出现###

Nota in replacement, character *is a jokermean any number of any character.

注意替换,字符*小丑意味着任何数量的任何字符。

The command echo "${c%%$'\n'}"would echo variable cbut without any number of newline at end of string.

该命令echo "${c%%$'\n'}"将回显变量c但在字符串末尾没有任何换行符。

So if variablecontain Hello WorldZorGluBHello youZorGluBI'm happy,

所以如果变量包含Hello WorldZorGluBHello youZorGluBI'm happy

variable="Hello WorldZorGluBHello youZorGluBI'm happy"

$ echo ${variable#*ZorGluB}
Hello youZorGlubI'm happy

$ echo ${variable##*ZorGluB}
I'm happy

$ echo ${variable%ZorGluB*}
Hello WorldZorGluBHello you

$ echo ${variable%%ZorGluB*}
Hello World

$ echo ${variable%%ZorGluB}
Hello WorldZorGluBHello youZorGluBI'm happy

$ echo ${variable%happy}
Hello WorldZorGluBHello youZorGluBI'm

$ echo ${variable##* }
happy

All this is explained in the manpage:

所有这些都在联机帮助页中进行了解释:

$ man -Len -Pless\ +/##word bash

$ man -Len -Pless\ +/%%word bash

$ man -Len -Pless\ +/^\\ *export\\ .*word bash

Step by step, the splitting loop:

一步一步,分裂循环:

The separator:

分隔符:

mySep=$'======='

Declaring c_splitas an array(and could be shared with childs)

声明c_split数组(并且可以与孩子共享)

export -a c_split

While variable cdo contain at least one occurence of mySep

虽然变量c确实包含至少一次出现mySep

while [ "$c" != "${c#*$mySep}" ];do

Trunc cfrom first mySepto end of string and assign to part.

从字符串的第一个到结尾截断cmySep并分配给part.

    part="${c%%$mySep*}"

Remove leading newlines

删除前导换行符

    part="${part##$'\n'}"

Remove trailing newlines and add result as a new array element to c_split.

删除尾随换行符并将结果作为新数组元素添加到c_split.

    c_split+=("${part%%$'\n'}")

Reassing cwhith the rest of string when left upto mySepis removed

Reassing ç蒙山留下高达当字符串的其余部分mySep被移除

    c="${c#*$mySep}"

Done ;-)

完毕 ;-)

done

Remove leading newlines

删除前导换行符

c=${c##$'\n'}

Remove trailing newlines and add result as a new array element to c_split.

删除尾随换行符并将结果作为新数组元素添加到c_split.

c_split+=("${c%%$'\n'}")

Into a function:

成函数:

ssplit() {
    local string="" array=${2:-ssplited_array} delim="${3:- }" pos=0
    while [ "$string" != "${string#*$delim}" ];do
        printf -v $array[pos++] "%s" "${string%%$delim*}"
        string="${string#*$delim}"
      done
    printf -v $array[pos] "%s" "$string"
}

Usage:

用法:

ssplit "<quoted string>" [array name] [delimiter string]

where array nameis $splitted_arrayby default and delimiteris one single space.

其中数组名称$splitted_array默认的,分隔符是一个空格。

You could use:

你可以使用:

c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
ssplit "$c" c_split $'\n=======\n'
printf -- "--- part ----\n%s\n" "${c_split[@]}"
--- part ----
AA=A
B=BB
--- part ----
C==CC
DD=D
--- part ----
EEE
FF

回答by Kent

do it with awk:

用 awk 来做:

 awk -vRS='\n=*\n'  '{print "----- new part -----";print}' <<< $c

output:

输出:

kent$  awk -vRS='\n=*\n'  '{print "----- new part -----";print}' <<< $c
----- new part -----
AA=A
B=BB
----- new part -----
C==CC
DD=D
----- new part -----
EEE
FF

回答by Kent

Following script tested in bash:

以下脚本在 bash 中测试:

kent@7pLaptop:/tmp/test$ bash --version
GNU bash, version 4.2.42(2)-release (i686-pc-linux-gnu)

the script: (named t.sh)

脚本:(命名t.sh

#!/bin/bash

c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c"
echo "split now"

c_split=($(echo "$c"|awk -vRS="\n=*\n"  '{gsub(/\n/,"\n");printf 
kent@7pLaptop:/tmp/test$ ./t.sh 
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split now
---- new part ----
AA=A
B=BB
---- new part ----
C==CC
DD=D
---- new part ----
EEE
FF
" "}')) for i in ${c_split[@]} do echo "---- new part ----" echo -e "$i" done

output:

输出:

---- new part ----
AA=A\nB=BB
---- new part ----
C==CC\nDD=D
---- new part ----
EEE\nFF\n

notethe echo statement in that for loop, if you remove the option -eyou will see:

请注意该 for 循环中的 echo 语句,如果删除该选项,-e您将看到:

c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";

c_split=()
while IFS= read -r -d '' part
do
  c_split+=( "$part" )
done < <(printf "%s" "$c" | sed -e 's/=======/\x00/g')
c_split+=( "$part" )

for i in "${c_split[@]}"
do
    echo "------ new part ------"
    echo "$i"
done

take -eor not depends on your requirement.

-e与不取取决于您的要求。

回答by that other guy

Here's an approach that doesn't fumble when the data contains literal backslash sequences, spaces and other:

当数据包含文字反斜杠序列、空格和其他时,这是一种不会出错的方法:

#!/bin/bash

text=$(
  echo "AA=A"; echo "AA =A"; echo "AA=\nA"; echo "B=BB"; echo "=======";
  echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";
)
echo "more complex string"
echo "$text"
echo "split now"

c_split[0]=""
current=""
del=""
ind=0

# newline
newl=$'\n'

# Save IFS (not necessary when run as sub shell)
saveIFS="$IFS"
IFS="$newl"
for row in $text; do

  if [[ $row =~ ^=+$ ]]; then
    c_split[$ind]="$current"
    ((ind++))
    current=""
    # Avoid preceding newline
    del=""
    continue
  fi

  current+="$del$row"
  del="$newl"
done

# Restore IFS
IFS="$saveIFS"

# If there is a last poor part of the text
if [[ -n $current ]]; then
  c_split[$ind]="$current"
fi

# The result is an array
for i in "${c_split[@]}"
do
    echo "---- new part ----"
    echo "$i"
done

Note that the string is actually split on "=======" as requested, so the line feeds become part of the data (causing extra blank lines when "echo" adds its own).

请注意,字符串实际上是根据请求在“=======”上拆分的,因此换行符成为数据的一部分(当“echo”添加自己的行时,会导致额外的空行)。

回答by 244an

Added some in the example text because of this comment:

由于此评论,在示例文本中添加了一些内容:

This breaks if you replace AA=A with AA =A or with AA=\nA – that other guy

如果您将 AA=A 替换为 AA =A 或 AA=\nA,这会中断 - 另一个人

EDIT:I added a suggestion that isn't sensitive for some delimiter in the text. However this isn't using a "one line split" that OP was asking for, but this is how I should have done it ifI would do it in bash, and want the result in an array.

编辑:我添加了一个对文本中某些分隔符不敏感的建议。然而,这并没有使用 OP 要求的“单行拆分”,但是如果我在 bash 中执行它并且希望结果在一个数组中,我应该这样做。

script.sh (NEW):

script.sh(新):

#!/bin/bash

c=$(
  echo "AA=A"; echo "AA =A"; echo "AA=\nA"; echo "B=BB"; echo "=======";
  echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";
)
echo "more complex string"
echo "$c"
echo "split now"

# Now, this will be almost absolute secure,
# perhaps except a direct hit by lightning.
del=""
for ch in $'' $'' $'' $'' $'' $'' $''; do
  if [ -z "`echo "$c" | grep "$ch"`" ]; then
    del="$ch"
    break
  fi
done

if [ -z "$del" ]; then
  echo "Sorry, all this testing but no delmiter to use..."
  exit 1
fi

IFS="$del" c_split=($(echo "$c" | awk -vRS="\n=+\n" -vORS="$del" '1'))

for i in ${c_split[@]}
do
  echo "---- new part ----"
  echo "$i"
done

script.sh (OLD, with "one line split"):
(I stool the idea with awk from @Kentand adjusted it a bit)

script.sh(旧的,带有“一行拆分”):(
我用@Kent 的awk 提出了这个想法并稍微调整了一下)

[244an]$ bash --version
GNU bash, version 4.2.24(1)-release (x86_64-pc-linux-gnu)

[244an]$ ./script.sh
more complex string
AA=A
AA =A
AA=\nA
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split now
---- new part ----
AA=A
AA =A
AA=\nA
B=BB
---- new part ----
C==CC
DD=D
---- new part ----
EEE
FF

Output:

输出:

##代码##

I'm notusing -efor echo, to get AA=\\nAto not do a newline

没有使用-efor echoAA=\\nA为了不换行