如何在不折叠空格的情况下在 bash 脚本中拆分制表符分隔的字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19719827/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 08:28:00  来源:igfitidea点击:

How to split a tab-delimited string in bash script WITHOUT collapsing blanks?

bashstring-splittab-delimited

提问by Neil C. Obremski

I have my string in $LINEand I want $ITEMSto be the array version of this, split on single tabsand retaining blanks. Here's where I'm at now:

我有我的字符串,$LINE我想$ITEMS成为它的数组版本,在单个选项卡上拆分并保留空白。这是我现在所处的位置:

IFS=$'\n' ITEMS=($(echo "$LINE" | tr "\t" "\n"))

The issue here is that IFSis one-or-more so it gobbles up new-lines, tabs, whatever. I've tried a few other things based on other questions posted here but they assume that there will always be a value in all fields, never blank. And the one that seems to hold the keyis far beyond me and operating on an entire file (I am just splitting a single string).

这里的问题IFS是一个或多个,所以它吞噬了换行符、标签等。我已经根据此处发布的其他问题尝试了其他一些方法,但他们认为所有字段中总会有一个值,永远不会为空。而那个似乎掌握关键的人远远超出了我并且对整个文件进行了操作(我只是拆分了一个字符串)。

My preference here is a pure-BASH solution.

我的偏好是纯 BASH 解决方案。

采纳答案by rici

IFSis only one-or-more if the characters are whitespace. Non-whitespace characters are single delimiters. So a simple solution, if there is some non-whitespace character which you are confident is not in your string, is to translate tabs to that character and then split on it:

IFS如果字符是空格,则只有一个或多个。非空白字符是单个分隔符。因此,一个简单的解决方案是,如果您确信字符串中没有某些非空白字符,则将制表符转换为该字符,然后对其进行拆分:

IFS=$'' read -ra ITEMS <<<"${LINE//$'\t'/$''}"

Unfortunately, assumptions like "there is no instance of \2in the input" tend to fail in the long-run, where "in the long-run" translates to "at the worst possible time". So you might want to do it in two steps:

不幸的是,\2从长远来看,像“输入中没有实例”这样的假设往往会失败,而“从长远来看”意味着“在最糟糕的时间”。因此,您可能希望分两步完成:

IFS=$'' read -ra TEMP < <(tr $'\t' $'\t' <<<"$LINE")
ITEMS=("${TEMP[@]//$'\t'/$''}")

回答by chepner

One possibility: instead of splitting with IFS, use the -doption to readtab-terminated "lines" from the string. However, you need to ensure that your string endswith a tab as well, or you will lose the last item.

一种可能性:不是用 分割IFS,而是使用-d选项read从字符串中以制表符结尾的“行”。但是,您需要确保您的字符串也以制表符结尾,否则您将丢失最后一项。

items=()
while IFS='' read -r -d$'\t' x; do
   items+=( "$x" )
done <<< $'   foo   \t  bar\nbaz \t   foobar\t'

printf "===%s===\n" "${items[@]}"

Ensuring a trailing tab without adding an extra field can be accomplished with

确保可以在不添加额外字段的情况下使用尾随制表符

if [[ $str != *$'\t' ]]; then str+=$'\t'; fi

if necessary.

如有必要。

回答by Nathan SR

IFS Special Characters:

IFS 特殊字符:

Words of the form $'string' are treated specially.  The word expands to
string, with backslash-escaped characters replaced as specified by  the
ANSI  C  standard.  Backslash escape sequences, if present, are decoded
as follows:
       \a     alert (bell)
       \b     backspace
       \e
       \E     an escape character
       \f     form feed
       \n     new line
       \r     carriage return
       \t     horizontal tab
       \v     vertical tab
       \     backslash
       \'     single quote
       \"     double quote
       \?     question mark
       \nnn   the eight-bit character whose value is  the  octal  value
              nnn (one to three digits)
       \xHH   the  eight-bit  character  whose value is the hexadecimal
              value HH (one or two hex digits)
       \uHHHH the Unicode (ISO/IEC 10646) character whose value is  the
              hexadecimal value HHHH (one to four hex digits)
       \UHHHHHHHH
              the  Unicode (ISO/IEC 10646) character whose value is the
              hexadecimal value HHHHHHHH (one to eight hex digits)
       \cx    a control-x character 

The expanded result is single-quoted, as if the dollar sign had not been present.

扩展结果是单引号的,就好像美元符号不存在一样。

A double-quoted string preceded by a dollar sign ($"string") will cause the string to be translated according to the current locale. If the current locale is C or POSIX, the dollar sign is ignored. If the string is translated and replaced, the replacement is double-quoted.

以美元符号 ($"string") 开头的双引号字符串将使字符串根据当前语言环境进行翻译。如果当前语言环境是 C 或 POSIX,则忽略美元符号。如果字符串被翻译和替换,则替换是双引号。

回答by hrushikesh

line=$'zero\tone\ttwo'
IFS=$'\t' read -a arr <<< "${line}"
declare -p

Output is

输出是

declare -a arr='([0]="zero" [1]="one" [2]="two")'

Note.This doesn't deal with newlines in line.

笔记。这不处理line.

回答by gniourf_gniourf

A pure bash solution that will only split on tabs, and preserve newlines and other funny symbols, if any:

一个纯 bash 解决方案,只会在选项卡上拆分,并保留换行符和其他有趣的符号(如果有):

IFS=$'\t' read -r -a arr -d '' < <(printf '%s' "$line")

Try it:

尝试一下:

$ line=$'zero\tone with\nnewlines\ttwo\t     three   \n\t\tfive\n'
$ IFS=$'\t' read -r -a arr -d '' < <(printf '%s' "$line")
$ declare -p arr
declare -a arr='([0]="zero" [1]="one with
newlines" [2]="two" [3]="     three   
" [4]="five
")'

As you can see, this works flawlessly: it preserves everything (spaces, newlines, etc.), splits only at the tab characters.

如您所见,这完美无缺:它保留了所有内容(空格、换行符等),仅在制表符处拆分。

There's one drawback: it doesn't handle “empty fields”: observe there are two consecutive tabs in line; we would expect to get an empty field in arr, but that's not the case.

有一个缺点:它不处理“空字段”:观察中有两个连续的选项卡line;我们希望在 中得到一个空字段arr,但事实并非如此。

There's another less obvious drawback: the return code of readis 1, so technically, for Bash, there's a failure in this command. That's absolutely not a problem, unless you're using set -eor set -E, but this is not recommended anyways (so you shouldn't).

还有另一个不太明显的缺点:readis的返回码1,因此从技术上讲,对于 Bash,此命令存在故障。这绝对不是问题,除非您使用set -eset -E,但无论如何都不推荐这样做(因此您不应该这样做)。

If you can live with these two minor drawbacks, this might be the ideal solution.

如果您能忍受这两个小缺点,这可能是理想的解决方案。

Note that we're using < <(printf '%s' "$line")and not <<< "$line"to feed read, as the latter inserts a trailing newline.

请注意,我们正在使用< <(printf '%s' "$line")and 不是<<< "$line"feed read,因为后者插入了一个尾随换行符。