bash grep 返回每行字符串变体的多个匹配项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23109880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
grep to return the multiple matches of string variants per line
提问by RNJ
I have a file which contains db sequence names
我有一个包含数据库序列名称的文件
They have two forms as below
它们有以下两种形式
@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")
I want to return MY1_SEQ and MY2_SEQ
我想返回 MY1_SEQ 和 MY2_SEQ
If I use grep for _SEQ then I get the whole line
如果我对 _SEQ 使用 grep 那么我会得到整行
Ive tried to use awk
我试过使用 awk
grep SEQ * | awk '{print }'
but this does not cope with the fact that each line could be slightly different.
但这并不能解决每条线可能略有不同的事实。
I want to return the whole word (delimited by spaces) that matches _SEQ
我想返回匹配 _SEQ 的整个单词(以空格分隔)
How can I do this?
我怎样才能做到这一点?
回答by
You just need to adjust your grep pattern a bit and use -o
to return onlythe matched part:
您只需要稍微调整一下 grep 模式并使用它-o
来仅返回匹配的部分:
$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ(UENCE)?'
My1_SEQUENCE
MY1_SEQ
My2_SEQUENCE
MY2_SEQ
or of you just want the second one:
或者你只想要第二个:
$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ'
MY1_SEQ
MY2_SEQ
or, more generally, if you want xxx_SEQ
:
或者,更一般地说,如果你想要xxx_SEQ
:
$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o '[^ "]+_SEQ\b'
MY1_SEQ
MY2_SEQ
回答by anishsane
grep -Po '(?<=sequenceName = ")[^"]*' filename
回答by Andy Lester
If you use ack (http://beyondgrep.com) you can do this:
如果您使用 ack ( http://beyondgrep.com),您可以这样做:
ack 'MY\d_SEQ.+' -w -o filename
回答by jaypal singh
If you alwayswant the last field then awk
gives you a variable called NF
which can be used to retrieve the last value.
如果您总是想要最后一个字段,awk
则为您提供一个名为的变量NF
,可用于检索最后一个值。
$ awk '{gsub(/[")]/,"",$NF);print $NF}' file
MY1_SEQ
MY2_SEQ
Using gsub
we remove the quotes and parens.
使用gsub
我们删除引号和括号。
回答by Ashkan
awk '{match(@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")
, /MY.*_SEQ/,arr); print arr[0]}' input.txt
Input:
输入:
MY1_SEQ
MY2_SEQ
Output:
输出:
##代码##