为什么 bash 在对 C 样式字符串的内容进行 for 循环时忽略换行符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1650573/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 21:20:24  来源:igfitidea点击:

Why does bash ignore newlines when doing for loop over the contents of a C-style string?

bashscriptingfor-loopescaping

提问by EMPraptor

Why does the following...

为什么下面...

c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done

print out...

打印...

iteration 0 :1 2 3 4:

and not

并不是

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:

From what I understand, the $'STRING' syntax should allow me to specify a string with escape characters. Shouldn't "\n" be interpreted as newline so that the for loop echos four times, once for each line? Instead, it seems as if the newline is interpreted as a space character.

据我了解,$'STRING' 语法应该允许我指定一个带有转义字符的字符串。不应该将“\n”解释为换行符,以便 for 循环回显四次,每行一次?相反,似乎换行符被解释为空格字符。

I took unwind's suggestion and tried setting $IFS. The results were same.

我接受了放松的建议并尝试设置 $IFS。结果是一样的。

IFS=$'\n'; c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;

iteration 0 :1 2 3 4:

William Purssel says in a comment that this did not work because IFS was being set to newline... but following did not work.

威廉珀塞尔在评论中说这不起作用,因为 IFS 被设置为换行符......但以下不起作用。

IFS=' '; c=0; for i in '1 2 3 4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;

iteration 0 :1 2 3 4:

Using IFS=' ' on newline-separated string resulted in even more mess...

在换行符分隔的字符串上使用 IFS=' ' 导致更加混乱......

IFS=' '; c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;

iteration 0 :1
2
3
4:

setting IFS to '\n' rather than $'\n' had the same effect as IFS=' ' ...

将 IFS 设置为 '\n' 而不是 $'\n' 与 IFS=' ' ...

IFS='\n'; c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;

iteration 0 :1
2
3
4:

There's only one iteration, but the newline is visible in the echo for some reason.

只有一次迭代,但由于某种原因,换行符在回声中可见。

What did work is first storing the string in a variable then looping over the contents of the variable (without having to set IFS):

所做的工作是首先将字符串存储在变量中,然后循环访问变量的内容(无需设置 IFS):

c=0; v=$'1\n2\n3\n4'; for i in $v; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:

Which still does not explain why there is this problem.

这仍然没有解释为什么会出现这个问题。

Is there a pattern here? Is this the expected behavior of IFS as defined in unwind's link?

这里有模式吗?这是 unwind 链接中定义的 IFS 的预期行为吗?

unwind's link states... "The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting."

unwind 的链接状态...“shell 会扫描没有出现在双引号内的参数扩展、命令替换和算术扩展的结果,以进行分词。”

I guess that explains why string literals don't get split for for-loop iteration no matter what escape characters are used. Only when the literal is assigned to a variable then that variable is expanded to be split for the for-loop does it work. I guess also with command substitution.

我想这解释了为什么无论使用什么转义字符,字符串文字都不会因 for 循环迭代而被拆分。只有当文字被分配给一个变量时,该变量才被扩展为 for 循环拆分它才起作用。我猜也有命令替换。

Examples:

例子:

Result of command substitution is split

命令替换的结果被拆分

c=0; for i in `echo $'1\n2\n3\n4'`; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:

Portion of the string that was expanded is split, rest is not.

被扩展的字符串部分被拆分,其余的不是。

c=0; v=$'1 \n\t2\t3 4'; for i in $v$'\n5\n6'; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4 5 6:

When expansion happen in double quotes, no splitting occurs.

当扩展发生在双引号中时,不会发生拆分。

c=0; v=$'1\n2\n3 4'; for i in "$v"; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1 2 3 4:

Any sequence of SPACE, TAB, NEWLINE is used as delimiter for splitting.

SPACE、TAB、NEWLINE 的任何序列都用作拆分的分隔符。

c=0; v=$'1 2\t3 \t\n4'; for i in $v; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:

I will accept unwind's answer as his link yields the answer to my question.

我会接受放松的回答,因为他的链接可以回答我的问题。

No clue as to why behavior of echo within for-loop changes with value of IFS.

不知道为什么 for 循环中回声的行为会随着 IFS 的值而变化。

EDIT: extended to clarify.

编辑:扩展以澄清。

回答by Paused until further notice.

Bash doesn't do word expansion on quoted strings in this context. For example:

在这种情况下,Bash 不会对带引号的字符串进行单词扩展。例如:

$ for i in "a b c d"; do echo $i; done
a b c d

$ for i in a b c d; do echo $i; done
a
b
c
d

$ var="a b c d"; for i in "$var"; do echo $i; done
a b c d

$ var="a b c d"; for i in $var; do echo $i; done
a
b
c
d

In a comment, you stated "IFS='\n' also works. What doesn't work is IFS=$'\n'. I'm very very confused right now."

在评论中,您说“IFS='\n' 也有效。无效的是 IFS=$'\n'。我现在非常非常困惑。”

In IFS='\n', you're setting the separators (plural) to the two characters backslash and "n". So if you do this (inserting an "X" in the middle of a "\n") you see what happens. It's treating the "\n" sequences literally in spite of the fact you have them in $'':

在 中IFS='\n',您将分隔符(复数)设置为两个字符的反斜杠和“n”。所以如果你这样做(在“\n”中间插入一个“X”)你就会看到会发生什么。尽管事实上你有它们,但它从字面上处理“\n”序列$''

$ IFS='\n'; for i in $'a\Xnb\nc\n'; do echo $i; done; rrifs
a X b
c

Edit 2(in response to the comment):

编辑 2(回应评论):

It sees '\n'as two characters (not newline) and $'a\Xnb\nc\n'as a literal string of 10 characters (no newlines) then echooutputs the string and interprets the "\n" sequence as a newline (since the string is "marked" for interpretation), but since it's quoted it's seen as one string rather than words delimited by $IFS.

它看到'\n'两个字符(不是换行符)和$'a\Xnb\nc\n'10 个字符的文字字符串(没有换行符)然后echo输出字符串并将“\n”序列解释为换行符(因为字符串被“标记”以进行解释),但是由于它被引用它被视为一个字符串而不是由$IFS.

Try these for further comparison:

试试这些做进一步的比较:

$ c=0; for i in "a\nb\nc\n"; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a
b
c
:

$ c=0; for i in "a\nb\nc\n"; do echo "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a\nb\nc\n:

$ c=0; for i in a\nb\nc\n; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a
b
c
:

$ c=0; for i in a\nb\nc\n; do echo "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a\nb\nc\n:

Setting IFS has no effect on the above.

设置IFS对上述没有影响。

This works (note that $varis unquoted in the forstatement):

这有效(请注意,声明中$var未引用for):

$ var=$'a\nb\nc\n'
$ saveIFS="$IFS"   # it's important to save and restore $IFS
$ IFS=$'\n'        # set $IFS to a newline using $'\n' (not '\n')
$ c=0; for i in $var; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a:
iteration 1 :b:
iteration 2 :c:
$ IFS="$saveIFS"

回答by unwind

Change your $IFSsetting to change how bash splits text into words.

更改您的$IFS设置以更改 bash 将文本拆分为单词的方式。

Editor's note:
This answer was accepted, because it provides a link to information that ultimately explains the underlying issues.
Note, however, that the OP's problem can notbe solved simply by changing $IFS, because $IFSdoesn't apply to quoted strings.

编者注:
此答案已被接受,因为它提供了最终解释潜在问题​​的信息链接。
但是请注意,OP 的问题不能简单地通过更改来解决$IFS,因为$IFS不适用于带引号的字符串

回答by mouviciel

Two reasons:

两个原因:

  1. Your forloop loops only once: there is only one element to loop on, which is the $'1\n2\n3\n4'string. If you want to loop four times, you have to change $IFS, as suggested by unwind.

  2. echotakes this string, and interprets it as four arguments separated by newlines. It then displays all arguments separated by whitespaces. If you want that echodoesn't interpret the input string, put it in double quotes, as in echo "$i".

  1. 你的for循环只循环一次:只有一个元素可以循环,也就是$'1\n2\n3\n4'字符串。如果要循环四次,则必须$IFS按照 unwind 的建议更改。

  2. echo接受这个字符串,并将其解释为由换行符分隔的四个参数。然后它显示由空格分隔的所有参数。如果您希望echo不解释输入字符串,请将其放在双引号中,如echo "$i".

Edit, after question edit:

编辑,问题编辑后:

  • I tried changing $IFS: it worked, but I used export $IFS='\n'

  • In your second case, $vgets interpreted by bash in forcommand which interprets it as four arguments separated by newlines. If you want to get your first problem again, just use for f in "$v"instead of for f in $v.

  • 我尝试改变$IFS:它有效,但我用过export $IFS='\n'

  • 在您的第二种情况下,$v由 bash 在forcommand 中解释,它将其解释为由换行符分隔的四个参数。如果您想再次解决您的第一个问题,只需使用for f in "$v"代替for f in $v

回答by Matt Willtrout

try

尝试

c=0; for i in $'1\\n2\\n3\\n4'; do echo -e iteration $c :$i:; c=$[c+1]; done

c=0; for i in $'1\\n2\\n3\\n4'; do echo -e iteration $c :$i:; c=$[c+1]; done

the extra backslashes preserve the escapes for the newlines, the echo -etells echo to expand the escapes.

额外的反斜杠保留换行符的转义符,echo -e告诉 echo 以扩展转义符。

回答by mklement0

Dennis Williamson's helpful answerfully explains the symptoms, and even the question itself now mostly does; mouviciel's answerboils the issues down well, but (as of this writing) contains incorrect information about $IFS.
Therefore, let me attempt a summary of the rulesthat apply, followed by a detailed analysis:

丹尼斯·威廉姆森 (Dennis Williamson) 的有用回答充分解释了症状,甚至问题本身现在也大多如此;mouviciel 的回答很好地解决了这些问题,但是(在撰写本文时)包含有关$IFS.
因此,让我尝试总结一下适用的规则,然后进行详细分析:

  • With quoted strings, irrespective of the quoting style, IFS, the Internal Field Separatornevercomes into play.

    • A quoted string as the sole driver of a forloop alwaysresults in a singleiteration, with the (potentially expanded) string getting assigned as a wholeto the loop variable.
  • Splitting strings into words by the separator characters specified in $IFS(word-splitting) only applies to the resultsof unquoted expansions, namely:

    • unquoted variable references ($var), called parameter expansions(which includes transformations such as prefix and suffix removal, substitutions, ...)
    • unquoted command substitutions ($(...)or old-style `...`)
    • unquoted arithmetic expansions ($(( ... ))- note that syntax $[...]is obsoleteand should be avoided).
  • In order to assign control characters such as <newline>and <tab>to $IFS, use an ANSI C-quoted string($'...'), which understands escape sequencessuch as \nand \t; e.g., IFS=$'\n'; by contrast, IFS='\n'would assign 2 literalcharacters: literal \and literal n(single-quoted strings always use their content literally).

  • 对于带引号的字符串,无论引用样式如何IFS,内部字段分隔符都不会发挥作用

    • 带引号的字符串作为for循环的唯一驱动程序总是导致一次迭代,(可能扩展的)字符串作为一个整体分配给循环变量。
  • 分割字符串成由指定的分隔符字$IFS词分裂仅适用于结果不带引号的扩展,即:

    • 未加引号的变量引用 ( $var),称为参数扩展(包括转换,例如前缀和后缀移除、替换等)
    • 未加引号的命令替换($(...)或旧式`...`
    • 未加引号的算术扩展($(( ... ))- 请注意,语法$[...]过时,应避免使用)。
  • 为了分配<newline><tab>to等控制字符$IFS,请使用ANSI C 引用的字符串( $'...'),它可以理解转义序列,例如\nand \t; 例如,IFS=$'\n';相比之下,IFS='\n'将分配2 个文字字符:文字\和文字n(单引号字符串总是按字面意思使用它们的内容)。

Note that if the echocommand in the original code had used a single, double-quoted argument (echo "iteration $c :$i:"), then $IFSwould not have applied altogether, which would have avoided the confusion.

请注意,如果echo原始代码中的命令使用了单个双引号参数 ( echo "iteration $c :$i:"),则$IFS不会完全应用,这将避免混淆。



Analysis of the command from the question:

从问题分析命令:

c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done
  • $IFSand word-splitting onlyapply to the echocommand, not the forloop.

  • ANSI C-quoted string $'1\n2\n3\n4'as the loop driver results in the following 4-line string assigned to $i:

    1
    2
    3
    4
    
  • echo iteration $c :$i:, due to having only unquotedarguments, makes the shell subject them to word-splittingas well as globbing(filename expansion; although that has no effect in this particular case):

    • $c, due to containing just 0(in the one and only iteration) is not modified in the process.

    • :$i:, by contrast, based on $IFScontaining <space><tab><newline>by default, is split into 4 separate words: :1, 2, 3, and 4:- note how the enclosing :became part of the first and last word.

    • Note: To use a variable's value as-is, always double-quotethe variable reference.
      Word splitting and globbing are instances of shell expansions, which is the umbrella term for the up-front interpretation of arguments by the shell.

  • echois therefore handed 6 individual arguments: iteration, 0, and :1, 2, 3, and 4:. On output, echoconcatenates its arguments with a single space(unrelated to $IFS), yielding iteration 0 :1 2 3 4:

  • $IFS并且分词适用于echo命令,不适用于for循环。

  • ANSI C 引用的字符串$'1\n2\n3\n4'作为循环驱动程序导致分配给以下 4 行字符串$i

    1
    2
    3
    4
    
  • echo iteration $c :$i:,由于只有不带引号的参数,使 shell 使它们经受分词通配符文件名扩展;尽管在这种特殊情况下没有效果):

    • $c,由于只包含0(在一次且唯一的迭代中)在过程中没有被修改。

    • :$i:相比之下,基于$IFS包含<space><tab><newline>在默认情况下,被分成4次独立的话:123,和4:-记封闭如何:成为第一个和最后一个字的一部分。

    • 注意:要按原样使用变量的值,请始终双引号引用变量。
      分词和通配符是shell 扩展的实例,它是 shell 对参数的预先解释的总称。

  • echo因此传递6 个单独的参数iteration, 0, and :1, 2, 3, and 4:。在output 上echo将其参数与一个空格(与 无关$IFS)连接起来,产生iteration 0 :1 2 3 4:



How to write the loop robustly

如何稳健地编写循环

Note the double-quoting of the string passed to echo, and the embedded arithmetic expansion that combines reporting the current value of $cwith incrementing it afterwards ($((c++))).

请注意传递给 的字符串的双引号echo,以及将报告的当前值$c与之后递增 ( $((c++)))相结合的嵌入式算术扩展。

If the iteration values are known in advance:

如果预先知道迭代值:

# Simply use an unquoted, space-separated list (the indiv. elements may be quoted, however).
c=0; for i in 1 2 3 4; do echo "iteration: $((c++)) :$i:"; done

# Alternative, with an array:
vals=( 1 2 3 4 )
c=0; for i in "${vals[@]}"; do echo "iteration: $((c++)) :$i:"; done

# If the iteration values form a range of numbers, you can also use
# brace expansion (`for i in {1..4}...`) or, better for larger ranges
# and required for variable-based endpoints, a C-style loop (`for ((i=0;i<4;++i))...`)

If the iteration values are NOT known in advance:

如果事先不知道迭代值:

Using forto loop over lines of input is ill-advised, because the use of an unquoted expansion would require you to deal with possibly unwanted word-splitting and globbing, and because the entire input must be read into memory as a whole before the loop starts.

使用for循环输入的行是不明智的,因为使用不带引号的扩展将需要您处理可能不需要的分词和通配符,并且因为必须在循环开始之前将整个输入作为一个整体读入内存.

A whileloop to which the lines are provided via stdin is the better choice (<<<is a here-string, a string that is passed via stdin):

while向其中通过stdin提供的线环是更好的选择(<<<是一个下面的字符串,即通过stdin传递字符串):

c=0; while IFS= read -r i; do echo "iteration: $((c++)) :$i:"; done <<<$'1\n2\n3\n4'

readreads line by line, and -rcombined with IFS=(disabling word-splitting by setting it to the null string) ensures that each line is read in full, as-is.
Note that by prepending IFS=?directly to read, its value is localized to that command, without changing the current shell's $IFSvalue - this is a generic mechanism in POSIX-compatible shells.

read逐行读取,并-r结合IFS=(通过将其设置为空字符串来禁用分词)确保按原样完整读取每一行。
请注意,通过IFS=?直接添加到read,其值被本地化为该命令,而不会更改当前 shell 的$IFS值 - 这是POSIX 兼容 shell 中通用机制