如何在 bash 中的多字符分隔符上拆分字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40686922/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Howto split a string on a multi-character delimiter in bash?
提问by v217
Why doesn't work the following bash code?
为什么下面的 bash 代码不起作用?
for i in $( echo "emmbbmmaaddsb" | split -t "mm" )
do
echo "$i"
done
expected output:
预期输出:
e
bb
aaddsb
回答by Charles Duffy
Since you're expecting newlines, you can simply replace all instances of mm
in your string with a newline. In pure native bash:
由于您需要换行符,您可以简单地用换行符替换mm
字符串中的所有实例。在纯原生 bash 中:
in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"
If you wanted to do such a replacement on a longer input stream, you might be better off using awk
, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal
shell function (backending into awk
) given in BashFAQ #21is applicable:
如果您想对较长的输入流进行这样的替换,最好使用awk
,因为 bash 的内置字符串操作不能很好地扩展到超过几千字节的内容。BashFAQ #21 中给出的gsub_literal
shell 函数(后端为awk
)适用:
# Taken from http://mywiki.wooledge.org/BashFAQ/021
# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
# STR cannot be empty
[[ ]] || return
# string manip needed to escape '\'s, so awk doesn't expand '\n' and such
awk -v str="${1//\/\\}" -v rep="${2//\/\\}" '
# get the length of the search string
BEGIN {
len = length(str);
}
{
# empty the output string
out = "";
# continue looping while the search string is in the line
while (i = index(gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt
, str)) {
# append everything up to the search string, and the replacement string
out = out substr(#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
array+=( "${s%%"$delimiter"*}" );
s=${s#*"$delimiter"};
done;
declare -p array
, 1, i-1) rep;
# remove everything up to and including the first instance of the
# search string from the line
#!/bin/bash
# main string
str="LearnABCtoABCSplitABCaABCString"
# delimiter string
delimiter="ABC"
#length of main string
strLen=${#str}
#length of delimiter string
dLen=${#delimiter}
#iterator for length of string
i=0
#length tracker for ongoing substring
wordLen=0
#starting position for ongoing substring
strP=0
array=()
while [ $i -lt $strLen ]; do
if [ $delimiter == ${str:$i:$dLen} ]; then
array+=(${str:strP:$wordLen})
strP=$(( i + dLen ))
wordLen=0
i=$(( i + dLen ))
fi
i=$(( i + 1 ))
wordLen=$(( wordLen + 1 ))
done
array+=(${str:strP:$wordLen})
declare -p array
= substr(e
bb
aaddsb
, i + len);
}
# append whatever is left
out = out echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, "\n" ); print; }'
;
print out;
}
'
}
...used, in this context, as:
...在此上下文中用作:
echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, " " ); print ; }'
回答by Mallikarjun M
A more general example, without replacing the multi-character delimiter with a single character delimiter is given below :
下面给出了一个更一般的例子,没有用单个字符分隔符替换多字符分隔符:
Using parameter expansions : (from the comment of @gniourf_gniourf)
使用参数扩展:(来自@gniourf_gniourf 的评论)
##代码##A more crude kind of way
一种更粗暴的方式
##代码##Reference - Bash Tutorial- Bash Split String
参考 - Bash 教程- Bash 拆分字符串
回答by John Goofy
The recommended tool for character subtitution is sed
's command s/regexp/replacement/
for one regexp occurence or global s/regexp/replacement/g
, you do not even need a loop or variables.
推荐的字符替换工具是sed
's command s/regexp/replacement/
for one regexp出现或 global s/regexp/replacement/g
,您甚至不需要循环或变量。
Pipe your echo
output and try to substitute the characters mm
witht the newline character \n
:
管道echo
输出并尝试用mm
换行符替换字符\n
:
echo "emmbbmmaaddsb" | sed 's/mm/\n/g'
echo "emmbbmmaaddsb" | sed 's/mm/\n/g'
The output is:
输出是:
##代码##回答by Noam Manos
With awkyou can use the gsubto replace all regex matches.
使用awk,您可以使用gsub替换所有正则表达式匹配项。
As in your question, to replace all substrings of two or more 'm' chars with a new line, run:
与您的问题一样,要将两个或多个 'm' 字符的所有子字符串替换为新行,请运行:
##代码##e
bb
aaddsb
电子
bb
aaddsb
The ‘g' in gsub() stands for “global,” which means replace everywhere.
gsub() 中的“g”代表“global”,意思是到处替换。
You may also ask to print just N match, for example:
您还可以要求只打印 N 个匹配项,例如:
##代码##bb
bb