在 Bash 中将字符串拆分为数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10586153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 22:06:28  来源:igfitidea点击:

Split string into an array in Bash

arraysbashsplit

提问by Lgn

In a Bash script I would like to split a line into pieces and store them in an array.

在 Bash 脚本中,我想将一行分成几部分并将它们存储在一个数组中。

The line:

线路:

Paris, France, Europe

I would like to have them in an array like this:

我想将它们放在这样的数组中:

array[0] = Paris
array[1] = France
array[2] = Europe

I would like to use simple code, the command's speed doesn't matter. How can I do it?

我想使用简单的代码,命令的速度无关紧要。我该怎么做?

回答by Paused until further notice.

IFS=', ' read -r -a array <<< "$string"

Note that the characters in $IFSare treated individually as separators so that in this case fields may be separated by eithera comma or a space rather than the sequence of the two characters. Interestingly though, empty fields aren't created when comma-space appears in the input because the space is treated specially.

注意,在字符$IFS被单独视为分离器,使得在这种情况下,字段可以由被分离或者逗号或空间而不是两个字符的序列。有趣的是,当逗号空格出现在输入中时,不会创建空字段,因为空格被特殊处理。

To access an individual element:

要访问单个元素:

echo "${array[0]}"

To iterate over the elements:

迭代元素:

for element in "${array[@]}"
do
    echo "$element"
done

To get both the index and the value:

获取索引和值:

for index in "${!array[@]}"
do
    echo "$index ${array[index]}"
done

The last example is useful because Bash arrays are sparse. In other words, you can delete an element or add an element and then the indices are not contiguous.

最后一个例子很有用,因为 Bash 数组是稀疏的。换句话说,您可以删除一个元素或添加一个元素,然后索引不连续。

unset "array[1]"
array[42]=Earth

To get the number of elements in an array:

要获取数组中的元素数:

echo "${#array[@]}"

As mentioned above, arrays can be sparse so you shouldn't use the length to get the last element. Here's how you can in Bash 4.2 and later:

如上所述,数组可以是稀疏的,因此您不应使用长度来获取最后一个元素。在 Bash 4.2 及更高版本中,您可以这样做:

echo "${array[-1]}"

in any version of Bash (from somewhere after 2.05b):

在任何版本的 Bash 中(来自 2.05b 之后的某个地方):

echo "${array[@]: -1:1}"

Larger negative offsets select farther from the end of the array. Note the space before the minus sign in the older form. It is required.

较大的负偏移选择离数组末尾更远的​​地方。请注意旧形式中减号前的空格。它是必需的。

回答by bgoldst

All of the answers to this question are wrong in one way or another.

这个问题的所有答案在某种程度上都是错误的。



Wrong answer #1

错误答案#1

IFS=', ' read -r -a array <<< "$string"

1:This is a misuse of $IFS. The value of the $IFSvariable is nottaken as a single variable-lengthstring separator, rather it is taken as a setof single-characterstring separators, where each field that readsplits off from the input line can be terminated by anycharacter in the set (comma orspace, in this example).

1:这是对$IFS.的误用。所述的值$IFS变量作为一个单可变长度字符串分隔符,相反,它是作为一个单字符串分离器,其中,每个字段read从输入线分裂出可通过终止任何字符集合中的(在本例中为逗号空格)。

Actually, for the real sticklers out there, the full meaning of $IFSis slightly more involved. From the bash manual:

实际上,对于真正的坚持者来说, 的全部含义$IFS要稍微复杂一些。从bash 手册

The shell treats each character of IFSas a delimiter, and splits the results of the other expansions into words using these characters as field terminators. If IFSis unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline>at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFScharacters not at the beginning or end serves to delimit words. If IFShas a value other than the default, then sequences of the whitespace characters <space>, <tab>, and <newline>are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS(an IFSwhitespace character). Any character in IFSthat is not IFSwhitespace, along with any adjacent IFSwhitespace characters, delimits a field. A sequence of IFSwhitespace characters is also treated as a delimiter. If the value of IFSis null, no word splitting occurs.

Shell 将IFS 的每个字符视为分隔符,并使用这些字符作为字段终止符将其他扩展的结果拆分为单词。如果IFS未设置,或者它的值正好是<space><tab><newline>,默认值,则<space><tab><newline>序列在前面扩展的结果的开头和结尾被忽略,并且任何不在开头或结尾的IFS字符序列都用于分隔单词。如果IFS的值不是默认值,则空白字符序列<space><tab><只要空白字符在IFS(一个IFS空白字符)的值内,在单词的开头和结尾都会被忽略。在任何字符IFS不是IFS的空白,与任何相邻的沿IFS空白字符,限定一个字段。一系列IFS空白字符也被视为分隔符。如果IFS 的值为空,则不发生分词。

Basically, for non-default non-null values of $IFS, fields can be separated with either (1) a sequence of one or more characters that are all from the set of "IFS whitespace characters" (that is, whichever of <space>, <tab>, and <newline>("newline" meaning line feed (LF)) are present anywhere in $IFS), or (2) any non-"IFS whitespace character" that's present in $IFSalong with whatever "IFS whitespace characters" surround it in the input line.

基本上,对于 的非默认非空值$IFS,字段可以用 (1) 一个或多个字符的序列分隔,这些字符都来自“IFS 空白字符”集合(即<space> 中的任何一个,<tab><newline>(“换行”表示换行 (LF)) 出现在$IFS) 中的任何位置,或 (2) 存在的任何非“IFS 空白字符”$IFS以及围绕它的任何“IFS 空白字符”在输入行。

For the OP, it's possible that the second separation mode I described in the previous paragraph is exactly what he wants for his input string, but we can be pretty confident that the first separation mode I described is not correct at all. For example, what if his input string was 'Los Angeles, United States, North America'?

对于 OP,我在上一段中描述的第二种分离模式可能正是他想要的输入字符串,但我们可以非常确信我描述的第一种分离模式根本不正确。例如,如果他的输入字符串是'Los Angeles, United States, North America'?

IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")

2:Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following space or other baggage), if the value of the $stringvariable happens to contain any LFs, then readwill stop processing once it encounters the first LF. The readbuiltin only processes one line per invocation. This is true even if you are piping or redirecting input onlyto the readstatement, as we are doing in this example with the here-stringmechanism, and thus unprocessed input is guaranteed to be lost. The code that powers the readbuiltin has no knowledge of the data flow within its containing command structure.

2:即使您使用带有单字符分隔符的这种解决方案(例如逗号本身,即没有后面的空格或其他行李),如果$string变量的值恰好包含任何LF,那么read将一旦遇到第一个 LF 就停止处理。该read内建只处理每次调用一行。即使您将输入管道或重定向到read语句也是如此,正如我们在此示例中使用here-string机制所做的那样,因此保证会丢失未处理的输入。为read内置函数提供动力的代码不知道其包含的命令结构中的数据流。

You could argue that this is unlikely to cause a problem, but still, it's a subtle hazard that should be avoided if possible. It is caused by the fact that the readbuiltin actually does two levels of input splitting: first into lines, then into fields. Since the OP only wants one level of splitting, this usage of the readbuiltin is not appropriate, and we should avoid it.

您可能会争辩说,这不太可能导致问题,但仍然是一个微妙的危险,应该尽可能避免。这是由于read内置函数实际上做了两个级别的输入拆分:首先是行,然后是字段。由于 OP 只想要一级拆分,因此read内置函数的这种用法是不合适的,我们应该避免它。

3:A non-obvious potential issue with this solution is that readalways drops the trailing field if it is empty, although it preserves empty fields otherwise. Here's a demo:

3:此解决方案的一个不明显的潜在问题是,read如果尾随字段为空,则始终删除它,尽管否则它会保留空字段。这是一个演示:

string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")

Maybe the OP wouldn't care about this, but it's still a limitation worth knowing about. It reduces the robustness and generality of the solution.

也许 OP 不会关心这个,但这仍然是一个值得了解的限制。它降低了解决方案的鲁棒性和通用性。

This problem can be solved by appending a dummy trailing delimiter to the input string just prior to feeding it to read, as I will demonstrate later.

这个问题可以通过在输入字符串之前附加一个虚拟的尾随定界符来解决read,我将在后面演示。



Wrong answer #2

错误答案#2

string="1:2:3:4:5"
set -f                     # avoid globbing (expansion of *).
array=(${string//:/ })

Similar idea:

类似的想法:

t="one,two,three"
a=($(echo $t | tr ',' "\n"))

(Note: I added the missing parentheses around the command substitution which the answerer seems to have omitted.)

(注意:我在命令替换周围添加了缺少的括号,回答者似乎忽略了该括号。)

Similar idea:

类似的想法:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)

These solutions leverage word splitting in an array assignment to split the string into fields. Funnily enough, just like read, general word splitting also uses the $IFSspecial variable, although in this case it is implied that it is set to its default value of <space><tab><newline>, and therefore any sequence of one or more IFS characters (which are all whitespace characters now) is considered to be a field delimiter.

这些解决方案利用数组分配中的单词拆分将字符串拆分为字段。有趣的是,就像read,一般分词也使用$IFS特殊变量,尽管在这种情况下暗示它被设置为其默认值<space><tab><newline>,因此任何序列的一个或多个 IFS字符(现在都是空白字符)被认为是字段分隔符。

This solves the problem of two levels of splitting committed by read, since word splitting by itself constitutes only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already contain $IFScharacters, and thus they would be improperly split during the word splitting operation. This happens to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated at some point down the line. Once again, consider my counterexample of 'Los Angeles, United States, North America'(or 'Los Angeles:United States:North America').

这解决了由 提交的两级拆分问题read,因为单词拆分本身仅构成一级拆分。但是和以前一样,这里的问题是输入字符串中的各个字段可能已经包含$IFS字符,因此在分词操作期间它们会被不正确地拆分。对于这些回答者提供的任何示例输入字符串来说,情况恰好不是这种情况(多么方便......),但这当然不会改变这样一个事实,即使用此习语的任何代码库都将冒以下风险如果这个假设在生产线的某个点被违反,就会爆炸。再一次,考虑我的'Los Angeles, United States, North America'(或'Los Angeles:United States:North America')反例。

Also, word splitting is normally followed by filename expansion(akapathname expansion akaglobbing), which, if done, would potentially corrupt words containing the characters *, ?, or [followed by ](and, if extglobis set, parenthesized fragments preceded by ?, *, +, @, or !) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers has cleverly undercut this problem by running set -fbeforehand to disable globbing. Technically this works (although you should probably add set +fafterward to reenable globbing for subsequent code which may depend on it), but it's undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.

此外,词的拆分通常接着文件名扩展又名路径扩展又名通配符),其中,如果进行,将包含字符可能会损坏的话*?[随后](如果extglob被设置,括号片段之前通过?*+@,或!)通过将它们与文件系统对象进行匹配并相应地扩展单词(“globs”)。这三个回答者中的第一个通过set -f预先运行以禁用通配符巧妙地解决了这个问题。从技术上讲这是有效的(尽管您可能应该添加set +f之后为可能依赖于它的后续代码重新启用通配符),但为了破解本地代码中的基本字符串到数组解析操作而不得不弄乱全局 shell 设置是不可取的。

Another issue with this answer is that all empty fields will be lost. This may or may not be a problem, depending on the application.

此答案的另一个问题是所有空字段都将丢失。这可能是也可能不是问题,具体取决于应用程序。

Note: If you're going to use this solution, it's better to use the ${string//:/ }"pattern substitution" form of parameter expansion, rather than going to the trouble of invoking a command substitution (which forks the shell), starting up a pipeline, and running an external executable (tror sed), since parameter expansion is purely a shell-internal operation. (Also, for the trand sedsolutions, the input variable should be double-quoted inside the command substitution; otherwise word splitting would take effect in the echocommand and potentially mess with the field values. Also, the $(...)form of command substitution is preferable to the old `...`form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)

注意:如果您要使用此解决方案,最好使用参数扩展${string//:/ }“模式替换”形式,而不是费力地调用命令替换(分叉 shell)、启动管道以及运行外部可执行文件(或),因为参数扩展纯粹是 shell 内部操作。(另外,对于和解决方案,输入变量应该在命令替换内用双引号引起来;否则分词会在命令中生效并可能与字段值混淆。此外,命令替换的形式比旧的更可取trsedtrsedecho$(...)`...`形式,因为它简化了命令替换的嵌套,并允许文本编辑器更好地突出显示语法。)



Wrong answer #3

错误答案#3

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

This answer is almost the same as #2. The difference is that the answerer has made the assumption that the fields are delimited by two characters, one of which being represented in the default $IFS, and the other not. He has solved this rather specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting to split the fields on the surviving IFS-represented delimiter character.

这个答案与#2几乎相同。不同之处在于回答者假设字段由两个字符分隔,其中一个在 default 中表示,$IFS另一个没有。他通过使用模式替换扩展移除非 IFS 表示的字符,然后使用分词来拆分幸存的 IFS 表示的分隔符上的字段,从而解决了这个相当具体的情况。

This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider my counterexample: 'Los Angeles, United States, North America'.

这不是一个非常通用的解决方案。此外,可以说逗号实际上是此处的“主要”定界符,并且剥离它然后根据空格字符进行字段拆分是完全错误的。再次考虑我的反例:'Los Angeles, United States, North America'.

Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing for the assignment with set -fand then set +f.

同样,文件名扩展可能会破坏扩展的单词,但这可以通过暂时禁用全局分配来防止set -f,然后使用set +f

Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.

同样,所有空字段都将丢失,这可能是也可能不是问题,具体取决于应用程序。



Wrong answer #4

错误答案#4

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

This is similar to #2and #3in that it uses word splitting to get the job done, only now the code explicitly sets $IFSto contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw with previous wrong answers, and there is only one level of splitting, as required.

这与#2#3类似,因为它使用分词来完成工作,只是现在代码显式设置$IFS为仅包含输入字符串中存在的单字符字段分隔符。应该重申的是,这不适用于多字符字段分隔符,例如 OP 的逗号空格分隔符。但是对于像本示例中使用的 LF 这样的单字符分隔符,它实际上已经接近完美。正如我们在之前的错误答案中看到的那样,字段不能在中间被无意地拆分,并且根据需要只有一个级别的拆分。

One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved by wrapping the critical statement in set -fand set +f.

一个问题是文件名扩展会破坏前面描述的受影响的单词,尽管这可以通过将关键语句包装在set -f和 中再次解决set +f

Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields will be lost, just as in #2and #3. This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.

另一个潜在问题是,由于 LF 符合前面定义的“IFS 空白字符”的条件,因此所有空字段都将丢失,就像在#2#3 中一样。如果分隔符恰好是非“IFS 空白字符”,这当然不会成为问题,并且取决于应用程序,无论如何它可能无关紧要,但它确实破坏了解决方案的通用性。

So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't care about empty fields, and you wrap the critical statement in set -fand set +f, then this solution works, but otherwise not.

因此,总而言之,假设您有一个单字符分隔符,并且它不是“IFS 空白字符”,或者您不关心空字段,并且您将关键语句包装在set -fand 中set +f,那么此解决方案有效,否则不是。

(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the $'...'syntax, e.g. IFS=$'\n';.)

(此外,为了提供信息,可以使用$'...'语法更轻松地将 LF 分配给 bash 中的变量,例如IFS=$'\n';。)



Wrong answer #5

错误答案#5

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

Similar idea:

类似的想法:

IFS=', ' eval 'array=($string)'

This solution is effectively a cross between #1(in that it sets $IFSto comma-space) and #2-4(in that it uses word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the above wrong answers, sort of like the worst of all worlds.

该解决方案实际上是#1(因为它设置$IFS为逗号空格)和#2-4(因为它使用分词将字符串拆分为字段)之间的交叉。正因为如此,它遭受了影响上述所有错误答案的大多数问题,有点像所有世界中最糟糕的。

Also, regarding the second variant, it may seem like the evalcall is completely unnecessary, since its argument is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to using evalin this way. Normally, when you run a simple command which consists of a variable assignment only, meaning without an actual command word following it, the assignment takes effect in the shell environment:

此外,对于第二个变体,该eval调用似乎完全没有必要,因为它的参数是单引号字符串文字,因此是静态已知的。但实际上eval以这种方式使用有一个非常不明显的好处。通常,当您运行一个包含变量赋值的简单命令时,这意味着后面没有实际的命令字,赋值在 shell 环境中生效:

IFS=', '; ## changes $IFS in the shell environment

This is true even if the simple command involves multiplevariable assignments; again, as long as there's no command word, all variable assignments affect the shell environment:

即使简单命令涉及多个变量赋值也是如此;同样,只要没有命令字,所有变量赋值都会影响 shell 环境:

IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment

But, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does notaffect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin or external:

但是,如果变量赋值连接到命令名(我喜欢称之为“前缀分配”),那么它并不会影响shell环境,而是仅影响执行的命令的环境中,无论它是一个内置或外部:

IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it

Relevant quote from the bash manual:

bash 手册中的相关引用:

If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.

如果没有命令名称结果,变量赋值会影响当前的 shell 环境。否则,变量被添加到执行命令的环境中,不会影响当前的shell环境。

It is possible to exploit this feature of variable assignment to change $IFSonly temporarily, which allows us to avoid the whole save-and-restore gambit like that which is being done with the $OIFSvariable in the first variant. But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not involve a command word to make the $IFSassignment temporary. You might think to yourself, well why not just add a no-op command word to the statement like the : builtinto make the $IFSassignment temporary? This does not work because it would then make the $arrayassignment temporary as well:

可以利用变量赋值的这个特性来$IFS临时更改,这使我们能够避免像$OIFS在第一个变体中对变量所做的那样的整个保存和恢复策略。但是我们在这里面临的挑战是我们需要运行的命令本身只是一个变量赋值,因此它不会涉及命令字来使$IFS赋值临时。您可能会想,为什么不直接在像 the 这样的语句中添加一个 no-op 命令字: builtin来使$IFS赋值成为临时的呢?这不起作用,因为它也会使$array分配临时:

IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command

So, we're effectively at an impasse, a bit of a catch-22. But, when evalruns its code, it runs it in the shell environment, as if it was normal, static source code, and therefore we can run the $arrayassignment inside the evalargument to have it take effect in the shell environment, while the $IFSprefix assignment that is prefixed to the evalcommand will not outlive the evalcommand. This is exactly the trick that is being used in the second variant of this solution:

所以,我们实际上陷入了僵局,有点像第 22 条规则。但是,在eval运行它的代码时,它是在shell环境中运行的,就好像它是普通的静态源代码一样,因此我们可以运行参数$array内部的赋值eval使其在shell环境中生效,而$IFS前缀赋值是eval命令的前缀不会超过eval命令。这正是此解决方案的第二个变体中使用的技巧:

IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does

So, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of eval; just be careful to single-quote the argument string to guard against security threats.

因此,正如您所看到的,这实际上是一个非常聪明的技巧,并且以一种相当不明显的方式完成了所需的工作(至少在分配效果方面)。尽管涉及eval; 请小心单引号参数字符串以防止安全威胁。

But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.

但同样,由于“世界上最糟糕”的问题聚集,这仍然是对 OP 要求的错误答案。



Wrong answer #6

错误答案#6

IFS=', '; array=(Paris, France, Europe)

IFS=' ';declare -a array=(Paris France Europe)

Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents of the input string pasted into an array literal. I guess that's one way to do it.

嗯什么?OP 有一个需要解析为数组的字符串变量。这个“答案”从粘贴到数组文字中的输入字符串的逐字内容开始。我想这是一种方法。

It looks like the answerer may have assumed that the $IFSvariable affects all bash parsing in all contexts, which is not true. From the bash manual:

看起来回答者可能已经假设该$IFS变量会影响所有上下文中的所有 bash 解析,但事实并非如此。从 bash 手册:

IFS    The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the readbuiltin command. The default value is <space><tab><newline>.

IFS    内部字段分隔符,用于在扩展后进行分词,并使用read内置命令将行拆分为单词。默认值为<space><tab><newline>

So the $IFSspecial variable is actually only used in two contexts: (1) word splitting that is performed after expansion(meaning notwhen parsing bash source code) and (2) for splitting input lines into words by the readbuiltin.

所以这个$IFS特殊变量实际上只在两种情况下使用:(1)扩展后执行的分词(意思是在解析 bash 源代码时不是)和(2)通过read内置将输入行拆分为单词。

Let me try to make this clearer. I think it might be good to draw a distinction between parsingand execution. Bash must first parsethe source code, which obviously is a parsingevent, and then later it executesthe code, which is when expansion comes into the picture. Expansion is really an executionevent. Furthermore, I take issue with the description of the $IFSvariable that I just quoted above; rather than saying that word splitting is performed after expansion, I would say that word splitting is performed duringexpansion, or, perhaps even more precisely, word splitting is part ofthe expansion process. The phrase "word splitting" refers only to this step of expansion; it should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the words "split" and "words" a lot. Here's a relevant excerpt from the linux.die.net versionof the bash manual:

让我试着更清楚地说明这一点。我认为区分解析执行可能会很好。Bash 必须首先解析源代码,这显然是一个解析事件,然后它执行代码,这就是扩展出现的时候。扩张实际上是一个执行事件。此外,我对$IFS上面刚刚引用的变量的描述有异议;与其说分词是在扩展之后执行,我会说分词是扩展期间执行,或者更准确地说,分词是扩展的一部分扩展过程。短语“分词”仅指这一步扩展;它永远不应该用于指代 bash 源代码的解析,尽管不幸的是,文档似乎确实经常使用“split”和“words”这两个词。这是linux.die.net 版本的 bash 手册的相关摘录:

Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion.

The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

拆分成单词后在命令行上进行扩展。执行了七种扩展:大括号扩展波浪号扩展参数和变量扩展命令替换算术扩展、分路径名扩展

展开的顺序是:大括号展开;波浪号扩展、参数和变量扩展、算术扩展和命令替换(以从左到右的方式完成);分词;和路径名扩展。

You could argue the GNU versionof the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion section:

您可能会争辩说GNU 版本的手册做得稍微好一些,因为它在扩展部分的第一句中选择了“令牌”而不是“单词”:

Expansion is performed on the command line after it has been split into tokens.

将其拆分为令牌后,在命令行上执行扩展。

The important point is, $IFSdoes not change the way bash parses source code. Parsing of bash source code is actually a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule; for example, see the various compatxxshell settings, which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text that was parsed right off the source bytestream.

重要的一点是,$IFS不会改变 bash 解析源代码的方式。bash 源代码的解析实际上是一个非常复杂的过程,涉及到对 shell 语法的各种元素的识别,例如命令序列、命令列表、管道、参数扩展、算术替换和命令替换。在大多数情况下,bash 解析过程不能被像变量赋值这样的用户级操作改变(实际上,这个规则有一些小的例外;例如,参见各种compatxxshell 设置,这可以即时更改解析行为的某些方面)。然后根据上述文档摘录中分解的“扩展”的一般过程扩展由这个复杂解析过程产生的上游“单词”/“标记”,其中扩展(扩展?)文本的单词拆分到下游文字只是该过程的一个步骤。分词只涉及在前面的扩展步骤中吐出的文本;它不会影响从源字节流中解析出来的文字文本。



Wrong answer #7

错误答案#7

string='first line
        second line
        third line'

while read -r line; do lines+=("$line"); done <<<"$string"

This is one of the best solutions. Notice that we're back to using read. Didn't I say earlier that readis inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can call readin such a way that it effectively only does one level of splitting, specifically by splitting off only one field per invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.

这是最好的解决方案之一。请注意,我们又回到使用read. 我之前不是说这read是不合适的,因为当我们只需要一个时,它会执行两个级别的拆分吗?这里的技巧是,您可以read以这样一种方式调用,即它只有效地执行一级拆分,特别是每次调用只拆分一个字段,这需要在循环中重复调用它的成本。这有点花招,但它有效。

But there are problems. First: When you provide at least one NAMEargument to read, it automatically ignores leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFSis set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case, and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will want this. There is a solution, however: A somewhat non-obvious usage of readis to pass zero NAMEarguments. In this case, readwill store the entire input line that it gets from the input stream in a variable named $REPLY, and, as a bonus, it does notstrip leading and trailing whitespace from the value. This is a very robust usage of readwhich I've exploited frequently in my shell programming career. Here's a demonstration of the difference in behavior:

但是也有问题。第一:当您向 提供至少一个NAME参数时read,它会自动忽略从输入字符串中分离出来的每个字段中的前导和尾随空格。无论是否$IFS设置为其默认值,都会发生这种情况,如本文前面所述。现在,OP 可能并不关心他的特定用例,事实上,这可能是解析行为的一个理想特性。但并不是每个想要将字符串解析为字段的人都会想要这个。但是,有一个解决方案: 一个有点不明显的用法read是传递零个NAME参数。在这种情况下,read将从输入流中获取的整个输入行存储在名为 的变量中$REPLY,并且作为奖励,它不会从值中去除前导和尾随空格。这是一个非常健壮的用法read,我在我的 shell 编程生涯中经常利用它。这是行为差异的演示:

string=$'  a  b  \n  c  d  \n  e  f  '; ## input string

a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a  b" [1]="c  d" [2]="e  f") ## read trimmed surrounding whitespace

a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="  a  b  " [1]="  c  d  " [2]="  e  f  ") ## no trimming

The second issue with this solution is that it does not actually address the case of a custom field separator, such as the OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution. We could try to at least split on comma by specifying the separator to the -doption, but look what happens:

此解决方案的第二个问题是它实际上并未解决自定义字段分隔符的情况,例如 OP 的逗号空间。和以前一样,不支持多字符分隔符,这是此解决方案的一个不幸限制。我们可以尝试至少通过为-d选项指定分隔符来分割逗号,但看看会发生什么:

string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")

Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error: Europe is missing! What happened to it? The answer is that readreturns a failing return code if it hits end-of-file (in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the while-loop to break prematurely and we lose the final field.

可以预见的是,周围未计入的空白被拉入字段值中,因此这必须随后通过修剪操作进行纠正(这也可以直接在 while 循环中完成)。但还有一个明显的错误:欧洲缺失了!这是怎么回事?答案是,read如果它到达文件尾(在这种情况下我们可以称之为字符串尾),而没有在最终字段上遇到最终字段终止符,则返回失败的返回码。这会导致 while 循环过早中断,我们失去了最后一个字段。

Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken to be LF, which is the default when you don't specify the -doption, and the <<<("here-string") mechanism automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of accidentallysolved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input. Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter by concatenating it against the input string ourselves when instantiating it in the here-string:

从技术上讲,同样的错误也影响了前面的例子。不同之处在于字段分隔符被视为 LF,这是您未指定-d选项时的默认值,并且<<<("here-string") 机制会在将其作为馈送之前自动将 LF 附加到字符串命令的输入。因此,在这些情况下,我们无意中在输入中附加了一个额外的虚拟终止符,从而意外地解决了丢弃 final 字段的问题。让我们将此解决方案称为“虚拟终结者”解决方案。我们可以通过在 here-string 中实例化它时自己将其与输入字符串连接起来,为任何自定义分隔符手动应用虚拟终止符解决方案:

a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a;
declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

There, problem solved. Another solution is to only break the while-loop if both (1) readreturned failure and (2) $REPLYis empty, meaning readwas not able to read any characters prior to hitting end-of-file. Demo:

在那里,问题解决了。另一种解决方案是仅在 (1)read返回失败且 (2)$REPLY为空时才中断 while 循环,这意味着read在到达文件结尾之前无法读取任何字符。演示:

a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

This approach also reveals the secretive LF that automatically gets appended to the here-string by the <<<redirection operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF problem) in one go.

这种方法还揭示了由<<<重定向操作符自动附加到 here-string 的秘密 LF 。它当然可以通过刚才描述的显式修剪操作单独剥离,但显然手动虚拟终止符方法可以直接解决它,所以我们可以继续这样做。手动虚拟终止器解决方案实际上非常方便,因为它一次性解决了这两个问题(drop-final-field 问题和appended-LF 问题)。

So, overall, this is quite a powerful solution. It's only remaining weakness is a lack of support for multicharacter delimiters, which I will address later.

所以,总的来说,这是一个非常强大的解决方案。唯一剩下的弱点是缺乏对多字符分隔符的支持,我将在稍后解决。



Wrong answer #8

错误答案#8

string='first line
        second line
        third line'

readarray -t lines <<<"$string"

(This is actually from the same post as #7; the answerer provided two solutions in the same post.)

(这实际上与#7来自同一帖子;回答者在同一帖子中提供了两个解决方案。)

The readarraybuiltin, which is a synonym for mapfile, is ideal. It's a builtin command which parses a bytestream into an array variable in one shot; no messing with loops, conditionals, substitutions, or anything else. And it doesn't surreptitiously strip any whitespace from the input string. And (if -Ois not given) it conveniently clears the target array before assigning to it. But it's still not perfect, hence my criticism of it as a "wrong answer".

readarray内置的,这是一个代名词mapfile,是理想的。这是一个内置命令,可以一次性将字节流解析为数组变量;不要搞乱循环、条件、替换或其他任何东西。它不会偷偷地从输入字符串中去除任何空格。并且(如果-O没有给出)它在分配给它之前方便地清除目标数组。但它仍然不完美,因此我批评它是一个“错误的答案”。

First, just to get this out of the way, note that, just like the behavior of readwhen doing field-parsing, readarraydrops the trailing field if it is empty. Again, this is probably not a concern for the OP, but it could be for some use-cases. I'll come back to this in a moment.

首先,为了解决这个问题,请注意,就像read进行字段解析时的行为一样,readarray如果尾随字段为空,则将其删除。同样,这可能不是 OP 的问题,但可能适用于某些用例。稍后我会回到这个话题。

Second, as before, it does not support multicharacter delimiters. I'll give a fix for this in a moment as well.

其次,和以前一样,它不支持多字符分隔符。稍后我也会对此进行修复。

Third, the solution as written does not parse the OP's input string, and in fact, it cannot be used as-is to parse it. I'll expand on this momentarily as well.

第三,所写的解决方案没有解析 OP 的输入字符串,实际上,它不能按原样使用来解析它。我也会暂时扩展这一点。

For the above reasons, I still consider this to be a "wrong answer" to the OP's question. Below I'll give what I consider to be the right answer.

由于上述原因,我仍然认为这是对 OP 问题的“错误答案”。下面我将给出我认为正确的答案。



Right answer

正确答案

Here's a na?ve attempt to make #8work by just specifying the -doption:

这是通过指定选项来使#8工作的天真尝试-d

string='Paris, France, Europe';
readarray -td, a <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

We see the result is identical to the result we got from the double-conditional approach of the looping readsolution discussed in #7. We can almostsolve this with the manual dummy-terminator trick:

我们看到结果与我们从#7 中read讨论的循环解决方案的双条件方法中得到的结果相同。我们几乎可以使用手动虚拟终结器技巧来解决这个问题:

readarray -td, a <<<"$string,"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')

The problem here is that readarraypreserved the trailing field, since the <<<redirection operator appended the LF to the input string, and therefore the trailing field was notempty (otherwise it would've been dropped). We can take care of this by explicitly unsetting the final array element after-the-fact:

这里的问题是readarray保留了尾随字段,因为<<<重定向运算符将 LF 附加到输入字符串,因此尾随字段不为空(否则它会被删除)。我们可以通过事后显式取消设置最终数组元素来解决这个问题:

readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

The only two problems that remain, which are actually related, are (1) the extraneous whitespace that needs to be trimmed, and (2) the lack of support for multicharacter delimiters.

唯一剩下的两个实际上相关的问题是 (1) 需要修剪的无关空白,以及 (2) 缺乏对多字符分隔符的支持。

The whitespace could of course be trimmed afterward (for example, see How to trim whitespace from a Bash variable?). But if we can hack a multicharacter delimiter, then that would solve both problems in one shot.

之后当然可以修剪空格(例如,请参阅如何从 Bash 变量中修剪空格?)。但是如果我们可以破解一个多字符分隔符,那么这将一次性解决这两个问题。

Unfortunately, there's no directway to get a multicharacter delimiter to work. The best solution I've thought of is to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed not to collide with the contents of the input string. The only character that has this guarantee is the NUL byte. This is because, in bash (though not in zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution. Here's how to do it using awk:

不幸的是,没有直接的方法让多字符分隔符起作用。我想到的最好的解决方案是对输入字符串进行预处理,用单字符分隔符替换多字符分隔符,保证不会与输入字符串的内容发生冲突。唯一具有此保证的字符是NUL 字节。这是因为,在 bash 中(虽然不是在 zsh 中,顺便说一下),变量不能包含 NUL 字节。此预处理步骤可以在进程替换中内联完成。以下是使用awk 的方法

readarray -td '' a < <(awk '{ gsub(/, /,"
function mfcb { local val=""; ""; eval "[]=$val;"; };
function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; };
function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; };
function val_trim { val_ltrim; val_rtrim; };
readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")
"); print; }' <<<"$string, "); unset 'a[-1]'; declare -p a; ## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

There, finally! This solution will not erroneously split fields in the middle, will not cut out prematurely, will not drop empty fields, will not corrupt itself on filename expansions, will not automatically strip leading and trailing whitespace, will not leave a stowaway LF on the end, does not require loops, and does not settle for a single-character delimiter.

到了,终于!此解决方案不会在中间错误地拆分字段,不会过早地切出,不会丢弃空字段,不会在文件名扩展时损坏自身,不会自动去除前导和尾随空格,不会在末尾留下偷渡的 LF,不需要循环,并且不满足于单字符分隔符。



Trimming solution

修边液

Lastly, I wanted to demonstrate my own fairly intricate trimming solution using the obscure -C callbackoption of readarray. Unfortunately, I've run out of room against Stack Overflow's draconian 30,000 character post limit, so I won't be able to explain it. I'll leave that as an exercise for the reader.

最后,我想用晦涩证明我自己相当复杂的修整解决方案-C callback的选择readarray。不幸的是,对于 Stack Overflow 严格的 30,000 个字符的帖子限制,我已经没有空间了,所以我无法解释它。我将把它留给读者作为练习。

string="1:2:3:4:5"
set -f                      # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
    echo "$i=>${array[i]}"
done

回答by Jim Ho

Here is a way without setting IFS:

这是一种无需设置 IFS 的方法:

${string//substring/replacement}

The idea is using string replacement:

这个想法是使用字符串替换:

(element1 element2 ... elementN)

to replace all matches of $substring with white space and then using the substituted string to initialize a array:

用空格替换 $substring 的所有匹配项,然后使用替换后的字符串初始化数组:

t="one,two,three"
a=($(echo "$t" | tr ',' '\n'))
echo "${a[2]}"

Note: this answer makes use of the split+glob operator. Thus, to prevent expansion of some characters (such as *) it is a good idea to pause globbing for this script.

注意:这个答案使用了split+glob 运算符。因此,为了防止扩展某些字符(例如*),最好暂停此脚本的通配符。

回答by Jmoney38

string='first line
        second line
        third line'

Prints three

打印三张

回答by Jmoney38

The accepted answer works for values in one line.
If the variable has several lines:

接受的答案适用于一行中的值。
如果变量有几行:

readarray -t lines <<<"$string"

We need a very different command to get all lines:

我们需要一个非常不同的命令来获取所有行:

while read -r line; do lines+=("$line"); done <<<"$string"

while read -r line; do lines+=("$line"); done <<<"$string"

Or the much simpler bash readarray:

或者更简单的 bash readarray

printf ">[%s]\n" "${lines[@]}"

>[first line]
>[        second line]
>[        third line]

Printing all lines is very easy taking advantage of a printf feature:

利用 printf 功能可以很容易地打印所有行:

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

for line in "${lines[@]}"
    do
        echo "--> $line"
done

回答by Luca Borrione

Sometimes it happened to me that the method described in the accepted answer didn't work, especially if the separator is a carriage return.
In those cases I solved in this way:

有时我遇到接受的答案中描述的方法不起作用,特别是如果分隔符是回车。
在这些情况下,我以这种方式解决了:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
echo ${array[0]}

回答by ssanch

This is similar to the approach by Jmoney38, but using sed:

这类似于Jmoney38方法,但使用 sed:

#!/bin/bash

str="Paris, France, Europe"
array=()
while read -r -d $'
#!/bin/bash

str="Paris, France, Europe"

array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
    array+=("${BASH_REMATCH[1]}")   # capture the field
    i=${#BASH_REMATCH}              # length of field + delimiter
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...
' each; do # use a NUL terminated field separator array+=("$each") done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"
## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"
#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array
"); print; };' <<<", "); unset 'a[-1]'; }; function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"
string="1 2 3 4 5"
declare -a array=($string)
"); print; };' <<<", "); }; function c_regex { a=(); local s=", "; while [[ $s =~ ([^,]+),\ ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; }; ## helper functions function rep { local -i i=-1; for ((i = 0; i<; ++i)); do printf %s ""; done; }; ## end rep() function testAll { local funcs=(); local args=(); local func=''; local -i rc=-1; while [[ "" != ':' ]]; do func=""; if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then echo "bad function name: $func" >&2; return 2; fi; funcs+=("$func"); shift; done; shift; args=("$@"); for func in "${funcs[@]}"; do echo -n "$func "; { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/'; rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi; done| column -ts/; }; ## end testAll() function makeStringToSplit { local -i n=; ## number of fields if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi; if [[ $n -eq 0 ]]; then echo; elif [[ $n -eq 1 ]]; then echo 'first field'; elif [[ "$n" -eq 2 ]]; then echo 'first field, last field'; else echo "first field, $(rep $[-2] 'mid field, ')last field"; fi; }; ## end makeStringToSplit() function testAll_splitIntoArray { local -i n=; ## number of fields in input string local s=''; echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) ====="; s="$(makeStringToSplit "$n")"; testAll c_readarray c_read c_regex : "$s"; }; ## end testAll_splitIntoArray() ## results testAll_splitIntoArray 1; ## ===== 1 field ===== ## c_readarray real 0m0.067s user 0m0.000s sys 0m0.000s ## c_read real 0m0.064s user 0m0.000s sys 0m0.000s ## c_regex real 0m0.000s user 0m0.000s sys 0m0.000s ## testAll_splitIntoArray 10; ## ===== 10 fields ===== ## c_readarray real 0m0.067s user 0m0.000s sys 0m0.000s ## c_read real 0m0.064s user 0m0.000s sys 0m0.000s ## c_regex real 0m0.001s user 0m0.000s sys 0m0.000s ## testAll_splitIntoArray 100; ## ===== 100 fields ===== ## c_readarray real 0m0.069s user 0m0.000s sys 0m0.062s ## c_read real 0m0.065s user 0m0.000s sys 0m0.046s ## c_regex real 0m0.005s user 0m0.000s sys 0m0.000s ## testAll_splitIntoArray 1000; ## ===== 1000 fields ===== ## c_readarray real 0m0.084s user 0m0.031s sys 0m0.077s ## c_read real 0m0.092s user 0m0.031s sys 0m0.046s ## c_regex real 0m0.125s user 0m0.125s sys 0m0.000s ## testAll_splitIntoArray 10000; ## ===== 10000 fields ===== ## c_readarray real 0m0.209s user 0m0.093s sys 0m0.108s ## c_read real 0m0.333s user 0m0.234s sys 0m0.109s ## c_regex real 0m9.095s user 0m9.078s sys 0m0.000s ## testAll_splitIntoArray 100000; ## ===== 100000 fields ===== ## c_readarray real 0m1.460s user 0m0.326s sys 0m1.124s ## c_read real 0m2.780s user 0m1.686s sys 0m1.092s ## c_regex real 17m38.208s user 15m16.359s sys 2m19.375s ##
"); print }') declare -p array # declare -a array=([0]="Paris" [1]="France" [2]="Europe") output

Prints 1

打印 1

回答by dawg

The key to splitting your string into an array is the multi character delimiter of ", ". Any solution using IFSfor multi character delimiters is inherently wrong since IFS is a set of those characters, not a string.

将字符串拆分为数组的关键是", ". 任何IFS用于多字符分隔符的解决方案本质上都是错误的,因为 IFS 是一组这些字符,而不是字符串。

If you assign IFS=", "then the string will break on EITHER ","OR " "or any combination of them which is not an accurate representation of the two character delimiter of ", ".

如果您分配,IFS=", "则字符串将在 EITHER ","OR" "或它们的任意组合处中断,这不是 . 的两个字符分隔符的准确表示", "

You can use awkor sedto split the string, with process substitution:

您可以使用awksed拆分字符串,并使用进程替换:

string="1,2,3,4,5"
delimiter=","
declare -a array=($(echo $string | tr "$delimiter" " "))

It is more efficient to use a regex you directly in Bash:

直接在 Bash 中使用正则表达式会更有效:

##代码##

With the second form, there is no sub shell and it will be inherently faster.

使用第二种形式,没有子 shell,它本质上会更快。



Edit by bgoldst:Here are some benchmarks comparing my readarraysolution to dawg's regex solution, and I also included the readsolution for the heck of it (note: I slightly modified the regex solution for greater harmony with my solution) (also see my comments below the post):

bgoldst 编辑:这里有一些基准将我的readarray解决方案与 dawg 的正则表达式解决方案进行比较,我还包括了read解决方案(注意:我稍微修改了正则表达式解决方案以与我的解决方案更加和谐)(另请参阅我在下面的评论邮政):

##代码##

回答by MrPotatoHead

Pure bash multi-character delimiter solution.

纯 bash 多字符分隔符解决方案。

As others have pointed out in this thread, the OP's question gave an example of a comma delimited string to be parsed into an array, but did not indicate if he/she was only interested in comma delimiters, single character delimiters, or multi-character delimiters.

正如其他人在此线程中指出的那样,OP 的问题给出了一个逗号分隔字符串的示例,将其解析为数组,但没有说明他/她是否只对逗号分隔符、单字符分隔符或多字符感兴趣分隔符。

Since Google tends to rank this answer at or near the top of search results, I wanted to provide readers with a strong answer to the question of multiple character delimiters, since that is also mentioned in at least one response.

由于谷歌倾向于将这个答案排在搜索结果的顶部或附近,我想为读者提供一个关于多个字符分隔符问题的强有力的答案,因为至少在一个回复中也提到了这一点。

If you're in search of a solution to a multi-character delimiter problem, I suggest reviewing Mallikarjun M's post, in particular the response from gniourf_gniourfwho provides this elegant pure BASH solution using parameter expansion:

如果您正在寻找多字符分隔符问题的解决方案,我建议您查看Mallikarjun M的帖子,特别是gniourf_gniourf的回复, 他使用参数扩展提供了这个优雅的纯 BASH 解决方案:

##代码##

Link to cited comment/referenced post

链接到引用的评论/引用的帖子

Link to cited question: Howto split a string on a multi-character delimiter in bash?

引用问题的链接:Howto split a string on a multi-character delimiter in bash?

回答by To Kra

This works for me on OSX:

这在 OSX 上对我有用:

##代码##

If your string has different delimiter, just 1st replace those with space:

如果您的字符串具有不同的分隔符,只需将它们替换为空格:

##代码##

Simple :-)

简单的 :-)