Linux Bash:如何标记字符串变量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5382712/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bash: How to tokenize a string variable?
提问by Jake Wilson
If I have a string variable who's value is "john is 17 years old"
how do I tokenize this using spaces as the delimeter? Would I use awk
?
如果我有一个字符串变量,其值是"john is 17 years old"
如何使用空格作为分隔符来标记它?我会用awk
吗?
采纳答案by John Kugelman
Use the shell's automatic tokenization of unquoted variables:
使用 shell 对未加引号变量的自动标记化:
$ string="john is 17 years old"
$ for word in $string; do echo "$word"; done
john
is
17
years
old
If you want to change the delimiter you can set the $IFS
variable, which stands for internal field separator. The default value of $IFS
is " \t\n"
(space, tab, newline).
如果要更改分隔符,可以设置$IFS
变量,它代表内部字段分隔符。的默认值$IFS
是" \t\n"
(空格,制表,换行)。
$ string="john_is_17_years_old"
$ (IFS='_'; for word in $string; do echo "$word"; done)
john
is
17
years
old
(Note that in this second example I added parentheses around the second line. This creates a sub-shell so that the change to $IFS
doesn't persist. You generally don't want to permanently change $IFS
as it can wreak havoc on unsuspecting shell commands.)
(请注意,在第二个示例中,我在第二行周围添加了括号。这会创建一个子 shell,以便更改$IFS
不会持续存在。您通常不希望永久更改,$IFS
因为它可能会对毫无戒心的 shell 命令造成严重破坏。 )
回答by harshit
you can try something like this :
你可以尝试这样的事情:
#!/bin/bash
n=0
a=/home/file.txt
for i in `cat ${a} | tr ' ' '\n'` ; do
str=${str},${i}
let n=$n+1
var=`echo "var${n}"`
echo $var is ... ${i}
done
回答by Diego Torres Milano
$ string="john is 17 years old"
$ tokens=( $string )
$ echo ${tokens[*]}
For other delimiters, like ';'
对于其他分隔符,如“;”
$ string="john;is;17;years;old"
$ IFS=';' tokens=( $string )
$ echo ${tokens[*]}
回答by kurumi
$ string="john is 17 years old"
$ set -- $string
$ echo
john
$ echo
is
$ echo
17
回答by Mila Nautikus
with POSIX extended regex:
使用 POSIX 扩展正则表达式:
$ str='a b c d'
$ echo "$str" | sed -E 's/\W+/\n/g' | hexdump -C
00000000 61 0a 62 0a 63 0a 64 0a |a.b.c.d.|
00000008
this is like python's re.split(r'\W+', str)
这就像蟒蛇的 re.split(r'\W+', str)
\W
matches a non-word character,
including space, tab, newline, return, [like the bash for
tokenizer]
but also including symbols like quotes, brackets, signs, ...
\W
匹配非单词字符,
包括空格、制表符、换行符、回车符、[如bash for
标记器],
但也包括引号、括号、符号等符号...
... except the underscore sign _
,
so snake_case
is one word, but kebab-case
are two words.
......除了下划线符号_
,
所以snake_case
是一字之差,却kebab-case
是两个词。
leading and trailing space will create an empty line.
前导和尾随空格将创建一个空行。