Linux 在bash中将字符串拆分为数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14630940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split string into array in bash
提问by user000001
I am looking for a way to split a string in bash over a delimiter string, and place the parts in an array.
我正在寻找一种将 bash 中的字符串拆分为分隔符字符串的方法,并将这些部分放在一个数组中。
Simple case:
简单案例:
#!/bin/bash
b="aaaaa/bbbbb/ddd/ffffff"
echo "simple string: $b"
IFS='/' b_split=($b)
echo ;
echo "split"
for i in ${b_split[@]}
do
echo "------ new part ------"
echo "$i"
done
Gives output
给出输出
simple string: aaaaa/bbbbb/ddd/ffffff
split
------ new part ------
aaaaa
------ new part ------
bbbbb
------ new part ------
ddd
------ new part ------
ffffff
More complex case:
更复杂的情况:
#!/bin/bash
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
for i in ${c_split[@]}
do
echo "------ new part ------"
echo "$i"
done
Gives output:
给出输出:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA
------ new part ------
A
B
------ new part ------
BB
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
C
------ new part ------
------ new part ------
CC
DD
------ new part ------
D
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
EEE
FF
I would like the second output to be like
我希望第二个输出像
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
I.e. to splitthe string on a sequence of characters, instead of one. How can I do this?
即在字符序列上拆分字符串,而不是一个。我怎样才能做到这一点?
I am looking for an answer that would only modify this line in the second script:
我正在寻找一个只会修改第二个脚本中这一行的答案:
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
采纳答案by F. Hauri
IFS
disambiguation
IFS
消歧义
IFS
mean Input Field Separators, as list of characters that could be used as separators
.
IFS
表示输入字段分隔符,如list of characters that could be used as separators
.
By default, this is set to \t\n
, meaning that any number (greater than zero) of space, tabulationand/ornewlinecould be oneseparator
.
默认情况下,这设置为 \t\n
,这意味着任何数量(大于零)的空格、制表和/或换行符都可以是一separator
。
So the string:
所以字符串:
" blah foo=bar
baz "
Leading and trailing separatorswould be ignored and this string will contain only 3
parts: blah
, foo=bar
and baz
.
前导和尾随分隔符将被忽略,该字符串将仅包含 3 个部分:blah
,foo=bar
和baz
。
Splitting a string using IFS
is possible if you know a valid field separator not used in your string.
IFS
如果您知道字符串中未使用的有效字段分隔符,则可以使用拆分字符串。
OIFS="$IFS"
IFS='§'
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
c_split=(${c//=======/§})
IFS="$OIFS"
printf -- "------ new part ------\n%s\n" "${c_split[@]}"
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
But this work only while string do not contain §
.
但这仅在字符串不包含§
.
You could use another character, like IFS=$'\026';c_split=(${c//=======/$'\026'})
but anyway this may involve furter bugs.
您可以使用另一个字符,IFS=$'\026';c_split=(${c//=======/$'\026'})
但无论如何这可能涉及更多错误。
You could browse character maps for finding one who's not in your string:
您可以浏览字符映射以查找不在字符串中的字符:
myIfs=""
for i in {1..255};do
printf -v char "$(printf "\\%03o" $i)"
[ "$c" == "${c#*$char}" ] && myIfs="$char" && break
done
if ! [ "$myIFS" ] ;then
echo no split char found, could not do the job, sorry.
exit 1
fi
but I find this solution a little overkill.
但我觉得这个解决方案有点矫枉过正。
Splitting on spaces (or without modifying IFS)
在空格上拆分(或不修改 IFS)
Under bash, we could use this bashism:
在bash 下,我们可以使用这种 bashism:
b="aaaaa/bbbbb/ddd/ffffff"
b_split=(${b//// })
In fact, this syntaxe ${varname//
will initiate a translation (delimited by /
) replacing all occurences of /
by a space , beforeassigning it to an array
b_split
.
事实上,此语法${varname//
将启动转换(以 分隔/
),在将其分配给数组之前,将所有出现/
的替换为空格。b_split
Of course, this still use IFS
and split array on spaces.
当然,这仍然IFS
在空间上使用和拆分数组。
This is not the best way, but could work with specific cases.
这不是最好的方法,但可以适用于特定情况。
You could even drop unwanted spaces before splitting:
您甚至可以在拆分之前删除不需要的空格:
b='12 34 / 1 3 5 7 / ab'
b1=${b// }
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]}" ;echo
<12>, <34>, <1>, <3>, <5>, <7>, <ab>,
or exchange thems...
或交换他们...
b1=${b// /§}
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]//§/ }" ;echo
<12 34 >, < 1 3 5 7 >, < ab>,
Splitting line on strings
:
分割线strings
:
So you have to notuse IFS
for your meaning, but bashdo have nice features:
所以你不能用IFS
你的意思,但bash确实有很好的特性:
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep='======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
echo "${c%%$mySep*}"
c="${c#*$mySep}"
done
echo "------ last part ------"
echo "$c"
Let see:
让我们看看:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
Nota: Leading and trailing newlines are not deleted. If this is needed, you could:
注意:不会删除前导和尾随换行符。如果需要,您可以:
mySep=$'\n=======\n'
instead of simply =======
.
而不是简单地=======
。
Or you could rewrite split loop for keeping explicitely this out:
或者您可以重写拆分循环以明确地保留这一点:
mySep=$'======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
part="${c%%$mySep*}"
part="${part##$'\n'}"
echo "${part%%$'\n'}"
c="${c#*$mySep}"
done
echo "------ last part ------"
c=${c##$'\n'}
echo "${c%%$'\n'}"
Any case, this match what SO question asked for (: and his sample :)
无论如何,这符合 SO 问题所要求的 (: 和他的样本 :)
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
Finaly creating an array
最后创建一个 array
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep=$'======='
export -a c_split
while [ "$c" != "${c#*$mySep}" ];do
part="${c%%$mySep*}"
part="${part##$'\n'}"
c_split+=("${part%%$'\n'}")
c="${c#*$mySep}"
done
c=${c##$'\n'}
c_split+=("${c%%$'\n'}")
for i in "${c_split[@]}"
do
echo "------ new part ------"
echo "$i"
done
Do this finely:
精细地做到这一点:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
Some explanations:
一些解释:
export -a var
to definevar
as an array and share them in childs${variablename%string*}
,${variablename%%string*}
result in the left part of variablename, upto but without string. One%
mean last occurence of stringand%%
for all occurences. Full variablenameis returned is stringnot found.${variablename#*string}
, do same in reverse way: return last part of variablenamefrom but without string. One#
mean first occurenceand two##
man all occurences.
export -a var
定义var
为数组并在孩子享它们${variablename%string*}
,${variablename%%string*}
导致variablename的左侧部分,直到但没有string。一个%
意思是字符串的最后一个实例,并%%
为所有出现。返回完整变量名是未找到字符串。${variablename#*string}
, 以相反的方式做同样的事情:从但没有string返回variablename 的最后一部分。一个意思是第一次出现,两个人全部出现。#
##
Nota in replacement, character *
is a jokermean any number of any character.
注意替换,字符*
是小丑意味着任何数量的任何字符。
The command echo "${c%%$'\n'}"
would echo variable cbut without any number of newline at end of string.
该命令echo "${c%%$'\n'}"
将回显变量c但在字符串末尾没有任何换行符。
So if variablecontain Hello WorldZorGluBHello youZorGluBI'm happy
,
所以如果变量包含Hello WorldZorGluBHello youZorGluBI'm happy
,
variable="Hello WorldZorGluBHello youZorGluBI'm happy"
$ echo ${variable#*ZorGluB}
Hello youZorGlubI'm happy
$ echo ${variable##*ZorGluB}
I'm happy
$ echo ${variable%ZorGluB*}
Hello WorldZorGluBHello you
$ echo ${variable%%ZorGluB*}
Hello World
$ echo ${variable%%ZorGluB}
Hello WorldZorGluBHello youZorGluBI'm happy
$ echo ${variable%happy}
Hello WorldZorGluBHello youZorGluBI'm
$ echo ${variable##* }
happy
All this is explained in the manpage:
所有这些都在联机帮助页中进行了解释:
$ man -Len -Pless\ +/##word bash
$ man -Len -Pless\ +/%%word bash
$ man -Len -Pless\ +/^\\ *export\\ .*word bash
Step by step, the splitting loop:
一步一步,分裂循环:
The separator:
分隔符:
mySep=$'======='
Declaring c_split
as an array(and could be shared with childs)
声明c_split
为数组(并且可以与孩子共享)
export -a c_split
While variable cdo contain at least one occurence of mySep
虽然变量c确实包含至少一次出现mySep
while [ "$c" != "${c#*$mySep}" ];do
Trunc cfrom first mySep
to end of string and assign to part
.
从字符串的第一个到结尾截断cmySep
并分配给part
.
part="${c%%$mySep*}"
Remove leading newlines
删除前导换行符
part="${part##$'\n'}"
Remove trailing newlines and add result as a new array element to c_split
.
删除尾随换行符并将结果作为新数组元素添加到c_split
.
c_split+=("${part%%$'\n'}")
Reassing cwhith the rest of string when left upto mySep
is removed
Reassing ç蒙山留下高达当字符串的其余部分mySep
被移除
c="${c#*$mySep}"
Done ;-)
完毕 ;-)
done
Remove leading newlines
删除前导换行符
c=${c##$'\n'}
Remove trailing newlines and add result as a new array element to c_split
.
删除尾随换行符并将结果作为新数组元素添加到c_split
.
c_split+=("${c%%$'\n'}")
Into a function:
成函数:
ssplit() {
local string="" array=${2:-ssplited_array} delim="${3:- }" pos=0
while [ "$string" != "${string#*$delim}" ];do
printf -v $array[pos++] "%s" "${string%%$delim*}"
string="${string#*$delim}"
done
printf -v $array[pos] "%s" "$string"
}
Usage:
用法:
ssplit "<quoted string>" [array name] [delimiter string]
where array nameis $splitted_array
by default and delimiteris one single space.
其中数组名称是$splitted_array
默认的,分隔符是一个空格。
You could use:
你可以使用:
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
ssplit "$c" c_split $'\n=======\n'
printf -- "--- part ----\n%s\n" "${c_split[@]}"
--- part ----
AA=A
B=BB
--- part ----
C==CC
DD=D
--- part ----
EEE
FF
回答by Kent
do it with awk:
用 awk 来做:
awk -vRS='\n=*\n' '{print "----- new part -----";print}' <<< $c
output:
输出:
kent$ awk -vRS='\n=*\n' '{print "----- new part -----";print}' <<< $c
----- new part -----
AA=A
B=BB
----- new part -----
C==CC
DD=D
----- new part -----
EEE
FF
回答by Kent
Following script tested in bash:
以下脚本在 bash 中测试:
kent@7pLaptop:/tmp/test$ bash --version
GNU bash, version 4.2.42(2)-release (i686-pc-linux-gnu)
the script: (named t.sh
)
脚本:(命名t.sh
)
#!/bin/bash
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c"
echo "split now"
c_split=($(echo "$c"|awk -vRS="\n=*\n" '{gsub(/\n/,"\n");printf kent@7pLaptop:/tmp/test$ ./t.sh
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split now
---- new part ----
AA=A
B=BB
---- new part ----
C==CC
DD=D
---- new part ----
EEE
FF
" "}'))
for i in ${c_split[@]}
do
echo "---- new part ----"
echo -e "$i"
done
output:
输出:
---- new part ----
AA=A\nB=BB
---- new part ----
C==CC\nDD=D
---- new part ----
EEE\nFF\n
notethe echo statement in that for loop, if you remove the option -e
you will see:
请注意该 for 循环中的 echo 语句,如果删除该选项,-e
您将看到:
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";
c_split=()
while IFS= read -r -d '' part
do
c_split+=( "$part" )
done < <(printf "%s" "$c" | sed -e 's/=======/\x00/g')
c_split+=( "$part" )
for i in "${c_split[@]}"
do
echo "------ new part ------"
echo "$i"
done
take -e
or not depends on your requirement.
取-e
与不取取决于您的要求。
回答by that other guy
Here's an approach that doesn't fumble when the data contains literal backslash sequences, spaces and other:
当数据包含文字反斜杠序列、空格和其他时,这是一种不会出错的方法:
#!/bin/bash
text=$(
echo "AA=A"; echo "AA =A"; echo "AA=\nA"; echo "B=BB"; echo "=======";
echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";
)
echo "more complex string"
echo "$text"
echo "split now"
c_split[0]=""
current=""
del=""
ind=0
# newline
newl=$'\n'
# Save IFS (not necessary when run as sub shell)
saveIFS="$IFS"
IFS="$newl"
for row in $text; do
if [[ $row =~ ^=+$ ]]; then
c_split[$ind]="$current"
((ind++))
current=""
# Avoid preceding newline
del=""
continue
fi
current+="$del$row"
del="$newl"
done
# Restore IFS
IFS="$saveIFS"
# If there is a last poor part of the text
if [[ -n $current ]]; then
c_split[$ind]="$current"
fi
# The result is an array
for i in "${c_split[@]}"
do
echo "---- new part ----"
echo "$i"
done
Note that the string is actually split on "=======" as requested, so the line feeds become part of the data (causing extra blank lines when "echo" adds its own).
请注意,字符串实际上是根据请求在“=======”上拆分的,因此换行符成为数据的一部分(当“echo”添加自己的行时,会导致额外的空行)。
回答by 244an
Added some in the example text because of this comment:
由于此评论,在示例文本中添加了一些内容:
This breaks if you replace AA=A with AA =A or with AA=\nA – that other guy
如果您将 AA=A 替换为 AA =A 或 AA=\nA,这会中断 - 另一个人
EDIT:I added a suggestion that isn't sensitive for some delimiter in the text. However this isn't using a "one line split" that OP was asking for, but this is how I should have done it ifI would do it in bash, and want the result in an array.
编辑:我添加了一个对文本中某些分隔符不敏感的建议。然而,这并没有使用 OP 要求的“单行拆分”,但是如果我在 bash 中执行它并且希望结果在一个数组中,我应该这样做。
script.sh (NEW):
script.sh(新):
#!/bin/bash
c=$(
echo "AA=A"; echo "AA =A"; echo "AA=\nA"; echo "B=BB"; echo "=======";
echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";
)
echo "more complex string"
echo "$c"
echo "split now"
# Now, this will be almost absolute secure,
# perhaps except a direct hit by lightning.
del=""
for ch in $'' $'' $'' $'' $'' $'' $''; do
if [ -z "`echo "$c" | grep "$ch"`" ]; then
del="$ch"
break
fi
done
if [ -z "$del" ]; then
echo "Sorry, all this testing but no delmiter to use..."
exit 1
fi
IFS="$del" c_split=($(echo "$c" | awk -vRS="\n=+\n" -vORS="$del" '1'))
for i in ${c_split[@]}
do
echo "---- new part ----"
echo "$i"
done
script.sh (OLD, with "one line split"):
(I stool the idea with awk from @Kentand adjusted it a bit)
script.sh(旧的,带有“一行拆分”):(
我用@Kent 的awk 提出了这个想法并稍微调整了一下)
[244an]$ bash --version
GNU bash, version 4.2.24(1)-release (x86_64-pc-linux-gnu)
[244an]$ ./script.sh
more complex string
AA=A
AA =A
AA=\nA
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split now
---- new part ----
AA=A
AA =A
AA=\nA
B=BB
---- new part ----
C==CC
DD=D
---- new part ----
EEE
FF
Output:
输出:
##代码##I'm notusing -e
for echo
, to get AA=\\nA
to not do a newline
我没有使用-e
for echo
,AA=\\nA
为了不换行