如何在不折叠空格的情况下在 bash 脚本中拆分制表符分隔的字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19719827/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split a tab-delimited string in bash script WITHOUT collapsing blanks?
提问by Neil C. Obremski
I have my string in $LINE
and I want $ITEMS
to be the array version of this, split on single tabsand retaining blanks. Here's where I'm at now:
我有我的字符串,$LINE
我想$ITEMS
成为它的数组版本,在单个选项卡上拆分并保留空白。这是我现在所处的位置:
IFS=$'\n' ITEMS=($(echo "$LINE" | tr "\t" "\n"))
The issue here is that IFS
is one-or-more so it gobbles up new-lines, tabs, whatever. I've tried a few other things based on other questions posted here but they assume that there will always be a value in all fields, never blank. And the one that seems to hold the keyis far beyond me and operating on an entire file (I am just splitting a single string).
这里的问题IFS
是一个或多个,所以它吞噬了换行符、标签等。我已经根据此处发布的其他问题尝试了其他一些方法,但他们认为所有字段中总会有一个值,永远不会为空。而那个似乎掌握关键的人远远超出了我并且对整个文件进行了操作(我只是拆分了一个字符串)。
My preference here is a pure-BASH solution.
我的偏好是纯 BASH 解决方案。
采纳答案by rici
IFS
is only one-or-more if the characters are whitespace. Non-whitespace characters are single delimiters. So a simple solution, if there is some non-whitespace character which you are confident is not in your string, is to translate tabs to that character and then split on it:
IFS
如果字符是空格,则只有一个或多个。非空白字符是单个分隔符。因此,一个简单的解决方案是,如果您确信字符串中没有某些非空白字符,则将制表符转换为该字符,然后对其进行拆分:
IFS=$'' read -ra ITEMS <<<"${LINE//$'\t'/$''}"
Unfortunately, assumptions like "there is no instance of \2
in the input" tend to fail in the long-run, where "in the long-run" translates to "at the worst possible time". So you might want to do it in two steps:
不幸的是,\2
从长远来看,像“输入中没有实例”这样的假设往往会失败,而“从长远来看”意味着“在最糟糕的时间”。因此,您可能希望分两步完成:
IFS=$'' read -ra TEMP < <(tr $'\t' $'\t' <<<"$LINE")
ITEMS=("${TEMP[@]//$'\t'/$''}")
回答by chepner
One possibility: instead of splitting with IFS
, use the -d
option to read
tab-terminated "lines" from the string. However, you need to ensure that your string endswith a tab as well, or you will lose the last item.
一种可能性:不是用 分割IFS
,而是使用-d
选项read
从字符串中以制表符结尾的“行”。但是,您需要确保您的字符串也以制表符结尾,否则您将丢失最后一项。
items=()
while IFS='' read -r -d$'\t' x; do
items+=( "$x" )
done <<< $' foo \t bar\nbaz \t foobar\t'
printf "===%s===\n" "${items[@]}"
Ensuring a trailing tab without adding an extra field can be accomplished with
确保可以在不添加额外字段的情况下使用尾随制表符
if [[ $str != *$'\t' ]]; then str+=$'\t'; fi
if necessary.
如有必要。
回答by Nathan SR
IFS Special Characters:
IFS 特殊字符:
Words of the form $'string' are treated specially. The word expands to
string, with backslash-escaped characters replaced as specified by the
ANSI C standard. Backslash escape sequences, if present, are decoded
as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\ backslash
\' single quote
\" double quote
\? question mark
\nnn the eight-bit character whose value is the octal value
nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal
value HH (one or two hex digits)
\uHHHH the Unicode (ISO/IEC 10646) character whose value is the
hexadecimal value HHHH (one to four hex digits)
\UHHHHHHHH
the Unicode (ISO/IEC 10646) character whose value is the
hexadecimal value HHHHHHHH (one to eight hex digits)
\cx a control-x character
The expanded result is single-quoted, as if the dollar sign had not been present.
扩展结果是单引号的,就好像美元符号不存在一样。
A double-quoted string preceded by a dollar sign ($"string") will cause the string to be translated according to the current locale. If the current locale is C or POSIX, the dollar sign is ignored. If the string is translated and replaced, the replacement is double-quoted.
以美元符号 ($"string") 开头的双引号字符串将使字符串根据当前语言环境进行翻译。如果当前语言环境是 C 或 POSIX,则忽略美元符号。如果字符串被翻译和替换,则替换是双引号。
回答by hrushikesh
line=$'zero\tone\ttwo'
IFS=$'\t' read -a arr <<< "${line}"
declare -p
Output is
输出是
declare -a arr='([0]="zero" [1]="one" [2]="two")'
Note.This doesn't deal with newlines in line
.
笔记。这不处理line
.
回答by gniourf_gniourf
A pure bash solution that will only split on tabs, and preserve newlines and other funny symbols, if any:
一个纯 bash 解决方案,只会在选项卡上拆分,并保留换行符和其他有趣的符号(如果有):
IFS=$'\t' read -r -a arr -d '' < <(printf '%s' "$line")
Try it:
尝试一下:
$ line=$'zero\tone with\nnewlines\ttwo\t three \n\t\tfive\n'
$ IFS=$'\t' read -r -a arr -d '' < <(printf '%s' "$line")
$ declare -p arr
declare -a arr='([0]="zero" [1]="one with
newlines" [2]="two" [3]=" three
" [4]="five
")'
As you can see, this works flawlessly: it preserves everything (spaces, newlines, etc.), splits only at the tab characters.
如您所见,这完美无缺:它保留了所有内容(空格、换行符等),仅在制表符处拆分。
There's one drawback: it doesn't handle “empty fields”: observe there are two consecutive tabs in line
; we would expect to get an empty field in arr
, but that's not the case.
有一个缺点:它不处理“空字段”:观察中有两个连续的选项卡line
;我们希望在 中得到一个空字段arr
,但事实并非如此。
There's another less obvious drawback: the return code of read
is 1
, so technically, for Bash, there's a failure in this command. That's absolutely not a problem, unless you're using set -e
or set -E
, but this is not recommended anyways (so you shouldn't).
还有另一个不太明显的缺点:read
is的返回码1
,因此从技术上讲,对于 Bash,此命令存在故障。这绝对不是问题,除非您使用set -e
或set -E
,但无论如何都不推荐这样做(因此您不应该这样做)。
If you can live with these two minor drawbacks, this might be the ideal solution.
如果您能忍受这两个小缺点,这可能是理想的解决方案。
Note that we're using < <(printf '%s' "$line")
and not <<< "$line"
to feed read
, as the latter inserts a trailing newline.
请注意,我们正在使用< <(printf '%s' "$line")
and 不是<<< "$line"
feed read
,因为后者插入了一个尾随换行符。