bash 如何使用sed从字符串中提取多个文本和数字?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27922910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:10:41  来源:igfitidea点击:

How to extract multiple text and numbers from a string using sed?

bashshellsed

提问by Mikey R

How can I extract 3 or more separate text from a line using 'sed'

如何使用“sed”从一行中提取 3 个或更多单独的文本

I have the following line:

我有以下几行:

echo <MX><[Mike/DOB-029/Post-555/Male]><MX>

So far I am able to extract the 'DOB-029' by doing

到目前为止,我能够通过执行提取“DOB-029”

sed -n 's/.*\(DOB-[0-9]*\).*//p'

but I am not getting the other texts such as the name or the post.

但我没有收到其他文本,例如姓名或职位。

My expected output should be MikeDOB-029Post-555

我的预期输出应该是Mike DOB-029 Post-555

Edited

已编辑

Say I have a list within a file and I want to extract specific text/IDs from the entire list and save it to a .txt file

假设我在文件中有一个列表,我想从整个列表中提取特定的文本/ID 并将其保存到 .txt 文件中

回答by ShellFish

sed 's/.*[\(.*\).\(DOB-[0-9]*\).\(Post-[0-9]*\).*/\1 \2 \3/'should do the trick!

sed 's/.*[\(.*\).\(DOB-[0-9]*\).\(Post-[0-9]*\).*/\1 \2 \3/'应该做的伎俩!

Parts in between \(and \)are captured strings that can be called upon using \iwith ithe index of the group.

\(和之间的部分\)是捕获的字符串,可以\ii组的索引一起使用。

Script for custom use:

自定义使用脚本:

#! /bin/bash


fields=${1:-123}
file='/path/to/input'

name=$(sed 's/.*\[\([^\/]*\)\/.*//' $file)
dob=$(sed 's/.*\(DOB-[0-9]*\).*//' $file)
post=$(sed 's/.*\(Post-[0-9]*\).*//' $file)

[[ $fields =~ .*1.* ]] && output=$name
[[ $fields =~ .*2.* ]] && output="$output $dob"
[[ $fields =~ .*3.* ]] && output="$output $post"

echo $output

Set the file with the line you want to parse in the filevariable (I can add more functionality such as supplying the file as argument or getting it from a larger file if you like). And execute the script with an int argument, if this int contains '1' it will display name, if 2, it will display DOB and 3 will output post information. You can combine to e.g. '123' or '32' or whichever combination you like.

使用file变量中要解析的行设置文件(如果您愿意,我可以添加更多功能,例如将文件作为参数提供或从更大的文件中获取)。并使用 int 参数执行脚本,如果该 int 包含 '1' 则显示名称,如果为 2,则显示 DOB,3 将输出发布信息。您可以组合成例如“123”或“32”或您喜欢的任何组合。

Stdin

标准输入

If you want to read from stdin, use following script:

如果要从 stdin 读取,请使用以下脚本:

#! /usr/bin/env bash

line=$(cat /dev/stdin)

fields=${1:-123}

name=$(echo $line | sed 's/.*\[\([^\/]*\)\/.*//')
dob=$(echo $line | sed 's/.*\(DOB-[0-9]*\).*//')
post=$(echo $line | sed 's/.*\(Post-[0-9]*\).*//')

[[ $fields =~ .*1.* ]] && output=$name
[[ $fields =~ .*2.* ]] && output="$output $dob"
[[ $fields =~ .*3.* ]] && output="$output $post"

echo $output

Example usage:

用法示例:

$ chmod +x script.sh
$ echo '<MX><[Mike/DOB-029/Post-555/Male]><MX>' | ./script.sh 123
Mike DOB-029 Post-555
$ echo '<MX><[Mike/DOB-029/Post-555/Male]><MX>' | ./script.sh 12
Mike DOB-029
$ echo '<MX><[Mike/DOB-029/Post-555/Male]><MX>' | ./script.sh 32
DOB-029 Post-555
$ echo '<MX><[Mike/DOB-029/Post-555/Male]><MX>' | ./script.sh 
Mike DOB-029 Post-555

回答by Arjun Mathew Dan

A solution with awk:

awk的解决方案

echo "<MX><[Mike/DOB-029/Post-555/Male]><MX>" | awk -F[/[] '{print , , }'

We set the delimiter as /or [(-F[/[]). then we just print the fields $2, $3 and $4which are the 2nd, 3rd and 4th fieldsrespectively.

我们将分隔符设置为/[( -F[/[])。然后我们只打印分别$2, $3 and $4是的字段2nd, 3rd and 4th fields

With sed:

使用sed:

echo "<MX><[Mike/DOB-029/Post-555/Male]><MX>" | sed 's/\(^.*\[\)\(.*\)\(\/[^/]*$\)//; s/\// /g'

回答by Marc Bredt

use the bash substitution builtins.

使用 bash 替换内置函数。

line="<MX><[Mike/D0B-029/Post-555/Male]><MX>"; 
linel=${line/*[/}; liner=${linel%\/*}; echo ${liner//\// }