如何在 Bash 中的分隔符上拆分字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/918886/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 18:13:32  来源:igfitidea点击:

How do I split a string on a delimiter in Bash?

bashshellsplitscripting

提问by stefanB

I have this string stored in a variable:

我将此字符串存储在一个变量中:

IN="[email protected];[email protected]"

Now I would like to split the strings by ;delimiter so that I have:

现在我想通过;分隔符分割字符串,以便我有:

ADDR1="[email protected]"
ADDR2="[email protected]"

I don't necessarily need the ADDR1and ADDR2variables. If they are elements of an array that's even better.

我不一定需要ADDR1ADDR2变量。如果它们是数组的元素,那就更好了。



After suggestions from the answers below, I ended up with the following which is what I was after:

根据以下答案的建议,我最终得到了以下这就是我所追求的:

#!/usr/bin/env bash

IN="[email protected];[email protected]"

mails=$(echo $IN | tr ";" "\n")

for addr in $mails
do
    echo "> [$addr]"
done

Output:

输出:

> [[email protected]]
> [[email protected]]

There was a solution involving setting Internal_field_separator(IFS) to ;. I am not sure what happened with that answer, how do you reset IFSback to default?

有一个解决方案涉及将Internal_field_separator(IFS)设置为;. 我不确定那个答案发生了什么,你如何重置IFS回默认值?

RE: IFSsolution, I tried this and it works, I keep the old IFSand then restore it:

RE:IFS解决方案,我试过这个并且它有效,我保留旧的IFS然后恢复它:

IN="[email protected];[email protected]"

OIFS=$IFS
IFS=';'
mails2=$IN
for x in $mails2
do
    echo "> [$x]"
done

IFS=$OIFS

BTW, when I tried

顺便说一句,当我尝试

mails2=($IN)

I only got the first string when printing it in loop, without brackets around $INit works.

在循环打印时我只得到第一个字符串,它周围没有括号$IN

采纳答案by Johannes Schaub - litb

You can set the internal field separator(IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFSonly takes place to that single command's environment (to read). It then parses the input according to the IFSvariable value into an array, which we can then iterate over.

您可以设置内部字段分隔符(IFS) 变量,然后让它解析为数组。当这种情况发生在一个命令中时,则分配 toIFS只发生在该单个命令的环境(to read)中。然后它根据IFS变量值将输入解析为一个数组,然后我们可以对其进行迭代。

IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[@]}"; do
    # process "$i"
done

It will parse one line of items separated by ;, pushing it into an array. Stuff for processing whole of $IN, each time one line of input separated by ;:

它将解析由 分隔的一行项目;,并将其推入一个数组。处理整个 的东西$IN,每次一行输入由 分隔;

 while IFS=';' read -ra ADDR; do
      for i in "${ADDR[@]}"; do
          # process "$i"
      done
 done <<< "$IN"

回答by palindrom

Taken from Bash shell script split array:

取自Bash shell 脚本拆分数组

IN="[email protected];[email protected]"
arrIN=(${IN//;/ })

Explanation:

解释:

This construction replaces all occurrences of ';'(the initial //means global replace) in the string INwith ' '(a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).

此构造将字符串中所有出现的';'(初始//表示全局替换)替换IN' '(单个空格),然后将空格分隔的字符串解释为数组(这就是周围括号的作用)。

The syntax used inside of the curly braces to replace each ';'character with a ' 'character is called Parameter Expansion.

在花括号内';'使用一个' '字符替换每个字符的语法称为Parameter Expansion

There are some common gotchas:

有一些常见的陷阱:

  1. If the original string has spaces, you will need to use IFS:
    • IFS=':'; arrIN=($IN); unset IFS;
  2. If the original string has spaces andthe delimiter is a new line, you can set IFSwith:
    • IFS=$'\n'; arrIN=($IN); unset IFS;
  1. 如果原始字符串有空格,则需要使用IFS
    • IFS=':'; arrIN=($IN); unset IFS;
  2. 如果原始字符串有空格并且分隔符是新行,则可以使用以下命令设置IFS
    • IFS=$'\n'; arrIN=($IN); unset IFS;

回答by Chris Lutz

If you don't mind processing them immediately, I like to do this:

如果您不介意立即处理它们,我喜欢这样做:

for i in $(echo $IN | tr ";" "\n")
do
  # process
done

You could use this kind of loop to initialize an array, but there's probably an easier way to do it. Hope this helps, though.

您可以使用这种循环来初始化数组,但可能有一种更简单的方法。不过,希望这会有所帮助。

回答by F. Hauri

Compatible answer

兼容答案

There are a lot of different ways to do this in bash.

bash中有很多不同的方法可以做到这一点。

However, it's important to first note that bashhas many specialfeatures (so-called bashisms) that won't work in any other shell.

但是,首先要注意它bash具有许多在任何其他shell中都不起作用的特殊功能(所谓的bashisms),这一点很重要。

In particular, arrays, associative arrays, and pattern substitution, which are used in the solutions in this post as well as others in the thread, are bashismsand may not work under other shellsthat many people use.

特别是,本文中的解决方案以及线程中的其他解决方案中使用的数组关联数组模式替换bashisms,可能无法在许多人使用的其他shell下工作。

For instance: on my Debian GNU/Linux, there is a standardshell called dash; I know many people who like to use another shell called ksh; and there is also a special tool called busyboxwith his own shell interpreter (ash).

例如:在我的Debian GNU/Linux 上,有一个名为dash标准shell ;我知道很多人喜欢使用另一个名为ksh 的shell ;还有一个叫做busybox的特殊工具,带有自己的 shell 解释器(ash)。

Requested string

请求的字符串

The string to be split in the above question is:

上面问题中要拆分的字符串是:

IN="[email protected];[email protected]"

I will use a modified version of this string to ensure that my solution is robust to strings containing whitespace, which could break other solutions:

我将使用此字符串的修改版本来确保我的解决方案对包含空格的字符串具有鲁棒性,这可能会破坏其他解决方案:

IN="[email protected];[email protected];Full Name <[email protected]>"

Split string based on delimiter in bash(version >=4.2)

基于bash中的分隔符拆分字符串(版本> = 4.2)

In purebash, we can create an arraywith elements split by a temporary value for IFS(the input field separator). The IFS, among other things, tells bashwhich character(s) it should treat as a delimiter between elements when defining an array:

pure 中bash,我们可以创建一个数组,其中元素被IFS的临时值(输入字段分隔符分割。IFS 会告诉您bash在定义数组时应将哪些字符视为元素之间的分隔符:

IN="[email protected];[email protected];Full Name <[email protected]>"

# save original IFS value so we can restore it later
oIFS="$IFS"
IFS=";"
declare -a fields=($IN)
IFS="$oIFS"
unset oIFS

In newer versions of bash, prefixing a command with an IFS definition changes the IFS for that command onlyand resets it to the previous value immediately afterwards. This means we can do the above in just one line:

在较新版本的 中bash,使用 IFS 定义为命令添加前缀只会更改该命令的 IFS,然后立即将其重置为以前的值。这意味着我们可以在一行中完成上述操作:

IFS=\; read -a fields <<<"$IN"
# after this command, the IFS resets back to its previous value (here, the default):
set | grep ^IFS=
# IFS=$' \t\n'

We can see that the string INhas been stored into an array named fields, split on the semicolons:

我们可以看到字符串IN已经存储到一个名为 的数组中fields,用分号分割:

set | grep ^fields=\\|^IN=
# fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")
# IN='[email protected];[email protected];Full Name <[email protected]>'

(We can also display the contents of these variables using declare -p:)

(我们也可以使用declare -p:)来显示这些变量的内容

declare -p IN fields
# declare -- IN="[email protected];[email protected];Full Name <[email protected]>"
# declare -a fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")

Note that readis the quickestway to do the split because there are no forksor external resources called.

请注意,这read是进行拆分的最快方法,因为没有调用分叉或外部资源。

Once the array is defined, you can use a simple loop to process each field (or, rather, each element in the array you've now defined):

一旦定义了数组,您就可以使用一个简单的循环来处理每个字段(或者,您现在定义的数组中的每个元素):

# `"${fields[@]}"` expands to return every element of `fields` array as a separate argument
for x in "${fields[@]}" ;do
    echo "> [$x]"
    done
# > [[email protected]]
# > [[email protected]]
# > [Full Name <[email protected]>]

Or you could drop each field from the array after processing using a shiftingapproach, which I like:

或者您可以在使用移位方法处理后从数组中删除每个字段,我喜欢这种方法:

while [ "$fields" ] ;do
    echo "> [$fields]"
    # slice the array 
    fields=("${fields[@]:1}")
    done
# > [[email protected]]
# > [[email protected]]
# > [Full Name <[email protected]>]

And if you just want a simple printout of the array, you don't even need to loop over it:

如果你只想要一个简单的数组打印输出,你甚至不需要循环它:

printf "> [%s]\n" "${fields[@]}"
# > [[email protected]]
# > [[email protected]]
# > [Full Name <[email protected]>]

Update: recent bash>= 4.4

更新:最近的bash>= 4.4

In newer versions of bash, you can also play with the command mapfile:

在较新版本的 中bash,您还可以使用以下命令mapfile

mapfile -td \; fields < <(printf "%s
mapfile -td \; fields <<<"$IN"
fields=("${fields[@]%$'\n'}")   # drop '\n' added by '<<<'
" "$IN")

This syntax preserve special chars, newlines and empty fields!

此语法保留特殊字符、换行符和空字段!

If you don't want to include empty fields, you could do the following:

如果不想包含空字段,可以执行以下操作:

myPubliMail() {
    printf "Seq: %6d: Sending mail to '%s'..."  ""
    # mail -s "This is not a spam..." "" </path/to/body
    printf "\e[3D, done.\n"
}

mapfile < <(printf "%s
mapfile < <(echo -n "$IN") -td \; -c 1 -C myPubliMail

# Seq:      0: Sending mail to '[email protected]', done.
# Seq:      1: Sending mail to '[email protected]', done.
# Seq:      2: Sending mail to 'Full Name <[email protected]>', done.
" "$IN") -td \; -c 1 -C myPubliMail

With mapfile, you can also skip declaring an array and implicitly "loop" over the delimited elements, calling a function on each:

使用mapfile,您还可以跳过声明数组并隐式“循环”分隔元素,在每个元素上调用一个函数:

myPubliMail() {
    local seq= dest="${2%$'\n'}"
    printf "Seq: %6d: Sending mail to '%s'..." $seq "$dest"
    # mail -s "This is not a spam..." "$dest" </path/to/body
    printf "\e[3D, done.\n"
}

mapfile <<<"$IN" -td \; -c 1 -C myPubliMail

# Renders the same output:
# Seq:      0: Sending mail to '[email protected]', done.
# Seq:      1: Sending mail to '[email protected]', done.
# Seq:      2: Sending mail to 'Full Name <[email protected]>', done.

(Note: the \0at end of the format string is useless if you don't care about empty fields at end of the string or they're not present.)

(注意:\0如果您不关心字符串末尾的空字段或它们不存在,则格式字符串的末尾是无用的。)

${var#*SubStr}  # drops substring from start of string up to first occurrence of `SubStr`
${var##*SubStr} # drops substring from start of string up to last occurrence of `SubStr`
${var%SubStr*}  # drops substring from last occurrence of `SubStr` to end of string
${var%%SubStr*} # drops substring from first occurrence of `SubStr` to end of string

Oryou could use <<<, and in the function body include some processing to drop the newline it adds:

或者你可以使用<<<, 并在函数体中包含一些处理来删除它添加的换行符:

IN="[email protected];[email protected];Full Name <[email protected]>"
while [ "$IN" ] ;do
    # extract the substring from start of string up to delimiter.
    # this is the first "element" of the string.
    iter=${IN%%;*}
    echo "> [$iter]"
    # if there's only one element left, set `IN` to an empty string.
    # this causes us to exit this `while` loop.
    # else, we delete the first "element" of the string from IN, and move onto the next.
    [ "$IN" = "$iter" ] && \
        IN='' || \
        IN="${IN#*;}"
  done
# > [[email protected]]
# > [[email protected]]
# > [Full Name <[email protected]>]

Split string based on delimiter in shell

基于shell中的分隔符拆分字符串

If you can't use bash, or if you want to write something that can be used in many different shells, you often can'tuse bashisms-- and this includes the arrays we've been using in the solutions above.

如果您不能使用bash,或者如果您想编写可以在许多不同 shell 中使用的东西,您通常不能使用bashisms—— 这包括我们在上述解决方案中一直使用的数组。

However, we don't need to use arrays to loop over "elements" of a string. There is a syntax used in many shells for deleting substrings of a string from the firstor lastoccurrence of a pattern. Note that *is a wildcard that stands for zero or more characters:

但是,我们不需要使用数组来遍历字符串的“元素”。许多 shell 中都使用了一种语法,用于从模式的第一次最后一次出现中删除字符串的子字符串。请注意,这*是一个代表零个或多个字符的通配符:

(The lack of this approach in any solution posted so far is the main reason I'm writing this answer ;)

(到目前为止发布的任何解决方案都缺乏这种方法是我写这个答案的主要原因;)

$ echo "[email protected];[email protected]" | cut -d ";" -f 1
[email protected]
$ echo "[email protected];[email protected]" | cut -d ";" -f 2
[email protected]

As explained by Score_Under:

正如Score_Under所解释的:

#and %delete the shortest possible matching substring from the startand endof the string respectively, and

##and %%delete the longest possible matching substring.

#并分别%从字符串的开头结尾删除最短的匹配子字符串,以及

##%%删除可能的最长匹配子串。

Using the above syntax, we can create an approach where we extract substring "elements" from the string by deleting the substrings up to or after the delimiter.

使用上述语法,我们可以创建一种方法,通过删除分隔符之前或之后的子字符串,从字符串中提取子字符串“元素”。

The codeblock below works well in bash(including Mac OS's bash), dash, ksh, and busybox's ash:

下面的代码块在bash(包括 Mac OS 的bash)、dashkshbusyboxash 中运行良好

2015-04-27|12345|some action|an attribute|meta data

Have fun!

玩得开心!

回答by DougW

I've seen a couple of answers referencing the cutcommand, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.

我已经看到几个引用该cut命令的答案,但它们都已被删除。没有人对此进行详细说明有点奇怪,因为我认为这是执行此类操作的更有用的命令之一,尤其是对于解析分隔的日志文件。

In the case of splitting this specific example into a bash script array, tris probably more efficient, but cutcan be used, and is more effective if you want to pull specific fields from the middle.

在将此特定示例拆分为 bash 脚本数组的情况下,tr可能更有效,但cut可以使用,如果您想从中间提取特定字段,则更有效。

Example:

例子:

string="1;2"
echo $string | cut -d';' -f1 # output is 1
echo $string | cut -d';' -f2 # output is 2

You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.

您显然可以将其放入循环中,并迭代 -f 参数以独立拉动每个字段。

This gets more useful when you have a delimited log file with rows like this:

当您有一个带有如下行的分隔日志文件时,这会变得更有用:

IN="[email protected];[email protected]" 
set -- "$IN" 
IFS=";"; declare -a Array=($*) 
echo "${Array[@]}" 
echo "${Array[0]}" 
echo "${Array[1]}" 

cutis very handy to be able to catthis file and select a particular field for further processing.

cut能够使用cat此文件并选择特定字段进行进一步处理非常方便。

回答by Steven Lizarazo

This worked for me:

这对我有用:

echo "[email protected];[email protected]" | awk -F';' '{print ,}'

回答by Steven Lizarazo

How about this approach:

这种方法怎么样:

[email protected] [email protected]

Source

来源

回答by Tong

I think AWKis the best and efficient command to resolve your problem. AWK is included by default in almost every Linux distribution.

我认为AWK是解决您问题的最佳和最有效的命令。几乎每个 Linux 发行版中都默认包含 AWK。

echo "[email protected];[email protected]" | sed -e 's/;/\n/g'
[email protected]
[email protected]

will give

会给

IN="[email protected];[email protected]"
echo ADD1=`echo $IN | cut -d \; -f 1`
echo ADD2=`echo $IN | cut -d \; -f 2`

Of course your can store each email address by redefining the awk print field.

当然,您可以通过重新定义 awk 打印字段来存储每个电子邮件地址。

回答by lothar

##代码##

回答by Ashok

This also works:

这也有效:

##代码##

Be careful, this solution is not always correct. In case you pass "[email protected]" only, it will assign it to both ADD1 and ADD2.

请注意,此解决方案并不总是正确的。如果您只传递“[email protected]”,它会将其分配给 ADD1 和 ADD2。