如何在 bash 中的多字符分隔符上拆分字符串？

Question

提问by v217

Why doesn't work the following bash code?

为什么下面的 bash 代码不起作用？

for i in $( echo "emmbbmmaaddsb" | split -t "mm"  )
do
    echo "$i"
done

expected output:

预期输出：

e
bb
aaddsb

Answer 1

回答by Charles Duffy

Since you're expecting newlines, you can simply replace all instances of mmin your string with a newline. In pure native bash:

由于您需要换行符，您可以简单地用换行符替换mm字符串中的所有实例。在纯原生 bash 中：

in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"

If you wanted to do such a replacement on a longer input stream, you might be better off using awk, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literalshell function (backending into awk) given in BashFAQ #21is applicable:

如果您想对较长的输入流进行这样的替换，最好使用awk，因为 bash 的内置字符串操作不能很好地扩展到超过几千字节的内容。BashFAQ #21 中给出的gsub_literalshell 函数（后端为awk）适用：

# Taken from http://mywiki.wooledge.org/BashFAQ/021

# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
  # STR cannot be empty
  [[  ]] || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\/\\}" -v rep="${2//\/\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index(gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt
, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr(#!/bin/bash

str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array
, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        #!/bin/bash

# main string
str="LearnABCtoABCSplitABCaABCString"

# delimiter string
delimiter="ABC"

#length of main string
strLen=${#str}
#length of delimiter string
dLen=${#delimiter}

#iterator for length of string
i=0
#length tracker for ongoing substring
wordLen=0
#starting position for ongoing substring
strP=0

array=()
while [ $i -lt $strLen ]; do
    if [ $delimiter == ${str:$i:$dLen} ]; then
        array+=(${str:strP:$wordLen})
        strP=$(( i + dLen ))
        wordLen=0
        i=$(( i + dLen ))
    fi
    i=$(( i + 1 ))
    wordLen=$(( wordLen + 1 ))
done
array+=(${str:strP:$wordLen})

declare -p array
 = substr(e
bb
aaddsb
, i + len);
      }

      # append whatever is left
      out = out echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, "\n" ); print; }'
;

      print out;
    }
  '
}

...used, in this context, as:

...在此上下文中用作：

echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, " " ); print ; }'

Answer 2

回答by Mallikarjun M

A more general example, without replacing the multi-character delimiter with a single character delimiter is given below :

下面给出了一个更一般的例子，没有用单个字符分隔符替换多字符分隔符：

Using parameter expansions : (from the comment of @gniourf_gniourf)

使用参数扩展：（来自@gniourf_gniourf 的评论）

##代码##

A more crude kind of way

一种更粗暴的方式

##代码##

Reference - Bash Tutorial- Bash Split String

参考 - Bash 教程- Bash 拆分字符串

Answer 3

回答by John Goofy

The recommended tool for character subtitution is sed's command s/regexp/replacement/for one regexp occurence or global s/regexp/replacement/g, you do not even need a loop or variables.

推荐的字符替换工具是sed's command s/regexp/replacement/for one regexp出现或 global s/regexp/replacement/g，您甚至不需要循环或变量。

Pipe your echooutput and try to substitute the characters mmwitht the newline character \n:

管道echo输出并尝试用mm换行符替换字符\n：

echo "emmbbmmaaddsb" | sed 's/mm/\n/g'

The output is:

输出是：

##代码##

Answer 4

回答by Noam Manos

With awkyou can use the gsubto replace all regex matches.

使用awk，您可以使用gsub替换所有正则表达式匹配项。

As in your question, to replace all substrings of two or more 'm' chars with a new line, run:

与您的问题一样，要将两个或多个 'm' 字符的所有子字符串替换为新行，请运行：

##代码##

e
bb
aaddsb

电子
bb
aaddsb

The ‘g' in gsub() stands for “global,” which means replace everywhere.

gsub() 中的“g”代表“global”，意思是到处替换。

You may also ask to print just N match, for example:

您还可以要求只打印 N 个匹配项，例如：

##代码##

bb

如何在 bash 中的多字符分隔符上拆分字符串？

提问by v217

回答by Charles Duffy

回答by Mallikarjun M

回答by John Goofy

回答by Noam Manos

相关推荐

最近更新

标签

如何在 bash 中的多字符分隔符上拆分字符串？

提问by v217

回答by Charles Duffy

回答by Mallikarjun M

回答by John Goofy

回答by Noam Manos

相关推荐

bash Dockerfile - 如何将答案传递给 apt-get 安装后的提示？

bash 如何将变量传递给 awk 命令行

bash 如何在 Linux 中禁用 90 天不活动的帐户？

bash AES128-CBC“坏幻数”和“读取输入文件时出错”

相关推荐

最近更新

标签