bash 如何使用sed从字符串中提取多个文本和数字?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27922910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract multiple text and numbers from a string using sed?
提问by Mikey R
How can I extract 3 or more separate text from a line using 'sed'
如何使用“sed”从一行中提取 3 个或更多单独的文本
I have the following line:
我有以下几行:
echo <MX><[Mike/DOB-029/Post-555/Male]><MX>
So far I am able to extract the 'DOB-029' by doing
到目前为止,我能够通过执行提取“DOB-029”
sed -n 's/.*\(DOB-[0-9]*\).*//p'
but I am not getting the other texts such as the name or the post.
但我没有收到其他文本,例如姓名或职位。
My expected output should be MikeDOB-029Post-555
我的预期输出应该是Mike DOB-029 Post-555
Edited
已编辑
Say I have a list within a file and I want to extract specific text/IDs from the entire list and save it to a .txt file
假设我在文件中有一个列表,我想从整个列表中提取特定的文本/ID 并将其保存到 .txt 文件中
回答by ShellFish
sed 's/.*[\(.*\).\(DOB-[0-9]*\).\(Post-[0-9]*\).*/\1 \2 \3/'
should do the trick!
sed 's/.*[\(.*\).\(DOB-[0-9]*\).\(Post-[0-9]*\).*/\1 \2 \3/'
应该做的伎俩!
Parts in between \(
and \)
are captured strings that can be called upon using \i
with i
the index of the group.
\(
和之间的部分\)
是捕获的字符串,可以\i
与i
组的索引一起使用。
Script for custom use:
自定义使用脚本:
#! /bin/bash
fields=${1:-123}
file='/path/to/input'
name=$(sed 's/.*\[\([^\/]*\)\/.*//' $file)
dob=$(sed 's/.*\(DOB-[0-9]*\).*//' $file)
post=$(sed 's/.*\(Post-[0-9]*\).*//' $file)
[[ $fields =~ .*1.* ]] && output=$name
[[ $fields =~ .*2.* ]] && output="$output $dob"
[[ $fields =~ .*3.* ]] && output="$output $post"
echo $output
Set the file with the line you want to parse in the file
variable (I can add more functionality such as supplying the file as argument or getting it from a larger file if you like). And execute the script with an int argument, if this int contains '1' it will display name, if 2, it will display DOB and 3 will output post information. You can combine to e.g. '123' or '32' or whichever combination you like.
使用file
变量中要解析的行设置文件(如果您愿意,我可以添加更多功能,例如将文件作为参数提供或从更大的文件中获取)。并使用 int 参数执行脚本,如果该 int 包含 '1' 则显示名称,如果为 2,则显示 DOB,3 将输出发布信息。您可以组合成例如“123”或“32”或您喜欢的任何组合。
Stdin
标准输入
If you want to read from stdin, use following script:
如果要从 stdin 读取,请使用以下脚本:
#! /usr/bin/env bash
line=$(cat /dev/stdin)
fields=${1:-123}
name=$(echo $line | sed 's/.*\[\([^\/]*\)\/.*//')
dob=$(echo $line | sed 's/.*\(DOB-[0-9]*\).*//')
post=$(echo $line | sed 's/.*\(Post-[0-9]*\).*//')
[[ $fields =~ .*1.* ]] && output=$name
[[ $fields =~ .*2.* ]] && output="$output $dob"
[[ $fields =~ .*3.* ]] && output="$output $post"
echo $output
Example usage:
用法示例:
$ chmod +x script.sh
$ echo '<MX><[Mike/DOB-029/Post-555/Male]><MX>' | ./script.sh 123
Mike DOB-029 Post-555
$ echo '<MX><[Mike/DOB-029/Post-555/Male]><MX>' | ./script.sh 12
Mike DOB-029
$ echo '<MX><[Mike/DOB-029/Post-555/Male]><MX>' | ./script.sh 32
DOB-029 Post-555
$ echo '<MX><[Mike/DOB-029/Post-555/Male]><MX>' | ./script.sh
Mike DOB-029 Post-555
回答by Arjun Mathew Dan
A solution with awk:
awk的解决方案:
echo "<MX><[Mike/DOB-029/Post-555/Male]><MX>" | awk -F[/[] '{print , , }'
We set the delimiter as /
or [
(-F[/[]
). then we just print the fields $2, $3 and $4
which are the 2nd, 3rd and 4th fields
respectively.
我们将分隔符设置为/
或[
( -F[/[]
)。然后我们只打印分别$2, $3 and $4
是的字段2nd, 3rd and 4th fields
。
With sed:
使用sed:
echo "<MX><[Mike/DOB-029/Post-555/Male]><MX>" | sed 's/\(^.*\[\)\(.*\)\(\/[^/]*$\)//; s/\// /g'
回答by Marc Bredt
use the bash substitution builtins.
使用 bash 替换内置函数。
line="<MX><[Mike/D0B-029/Post-555/Male]><MX>";
linel=${line/*[/}; liner=${linel%\/*}; echo ${liner//\// }