bash grep 返回每行字符串变体的多个匹配项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23109880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 10:13:54  来源:igfitidea点击:

grep to return the multiple matches of string variants per line

bashawkgrep

提问by RNJ

I have a file which contains db sequence names

我有一个包含数据库序列名称的文件

They have two forms as below

它们有以下两种形式

@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")

I want to return MY1_SEQ and MY2_SEQ

我想返回 MY1_SEQ 和 MY2_SEQ

If I use grep for _SEQ then I get the whole line

如果我对 _SEQ 使用 grep 那么我会得到整行

Ive tried to use awk

我试过使用 awk

grep SEQ * | awk '{print }'

but this does not cope with the fact that each line could be slightly different.

但这并不能解决每条线可能略有不同的事实。

I want to return the whole word (delimited by spaces) that matches _SEQ

我想返回匹配 _SEQ 的整个单词(以空格分隔)

How can I do this?

我怎样才能做到这一点?

回答by

You just need to adjust your grep pattern a bit and use -oto return onlythe matched part:

您只需要稍微调整一下 grep 模式并使用它-o返回匹配的部分:

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ(UENCE)?'
My1_SEQUENCE
MY1_SEQ
My2_SEQUENCE
MY2_SEQ

or of you just want the second one:

或者你只想要第二个:

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ'
MY1_SEQ
MY2_SEQ

or, more generally, if you want xxx_SEQ:

或者,更一般地说,如果你想要xxx_SEQ

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o '[^ "]+_SEQ\b'
MY1_SEQ
MY2_SEQ

回答by anishsane

grep -Po '(?<=sequenceName = ")[^"]*' filename

回答by Andy Lester

If you use ack (http://beyondgrep.com) you can do this:

如果您使用 ack ( http://beyondgrep.com),您可以这样做:

ack 'MY\d_SEQ.+' -w -o filename

回答by jaypal singh

If you alwayswant the last field then awkgives you a variable called NFwhich can be used to retrieve the last value.

如果您总是想要最后一个字段,awk则为您提供一个名为的变量NF,可用于检索最后一个值。

$ awk '{gsub(/[")]/,"",$NF);print $NF}' file
MY1_SEQ
MY2_SEQ

Using gsubwe remove the quotes and parens.

使用gsub我们删除引号和括号。

回答by Ashkan

awk  '{match(
@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")
, /MY.*_SEQ/,arr); print arr[0]}' input.txt

Input:

输入:

MY1_SEQ
MY2_SEQ

Output:

输出:

##代码##