bash grep 返回每行字符串变体的多个匹配项

Question

提问by RNJ

I have a file which contains db sequence names

我有一个包含数据库序列名称的文件

They have two forms as below

它们有以下两种形式

@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")

I want to return MY1_SEQ and MY2_SEQ

我想返回 MY1_SEQ 和 MY2_SEQ

If I use grep for _SEQ then I get the whole line

如果我对 _SEQ 使用 grep 那么我会得到整行

Ive tried to use awk

我试过使用 awk

grep SEQ * | awk '{print }'

but this does not cope with the fact that each line could be slightly different.

但这并不能解决每条线可能略有不同的事实。

I want to return the whole word (delimited by spaces) that matches _SEQ

我想返回匹配 _SEQ 的整个单词（以空格分隔）

How can I do this?

我怎样才能做到这一点？

Answer 1

回答by

You just need to adjust your grep pattern a bit and use -oto return onlythe matched part:

您只需要稍微调整一下 grep 模式并使用它-o来仅返回匹配的部分：

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ(UENCE)?'
My1_SEQUENCE
MY1_SEQ
My2_SEQUENCE
MY2_SEQ

or of you just want the second one:

或者你只想要第二个：

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ'
MY1_SEQ
MY2_SEQ

or, more generally, if you want xxx_SEQ:

或者，更一般地说，如果你想要xxx_SEQ：

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o '[^ "]+_SEQ\b'
MY1_SEQ
MY2_SEQ

Answer 2

回答by anishsane

grep -Po '(?<=sequenceName = ")[^"]*' filename

Answer 3

回答by Andy Lester

If you use ack (http://beyondgrep.com) you can do this:

如果您使用 ack ( http://beyondgrep.com)，您可以这样做：

ack 'MY\d_SEQ.+' -w -o filename

Answer 4

回答by jaypal singh

If you alwayswant the last field then awkgives you a variable called NFwhich can be used to retrieve the last value.

如果您总是想要最后一个字段，awk则为您提供一个名为的变量NF，可用于检索最后一个值。

$ awk '{gsub(/[")]/,"",$NF);print $NF}' file
MY1_SEQ
MY2_SEQ

Using gsubwe remove the quotes and parens.

使用gsub我们删除引号和括号。

Answer 5

回答by Ashkan

awk  '{match(@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")
, /MY.*_SEQ/,arr); print arr[0]}' input.txt

Input:

输入：

MY1_SEQ
MY2_SEQ

Output:

输出：

##代码##

bash grep 返回每行字符串变体的多个匹配项

提问by RNJ

回答by

回答by anishsane

回答by Andy Lester

回答by jaypal singh

回答by Ashkan

相关推荐

最近更新

标签

bash grep 返回每行字符串变体的多个匹配项

提问by RNJ

回答by

回答by anishsane

回答by Andy Lester

回答by jaypal singh

回答by Ashkan

相关推荐

bash 没有欢迎横幅的 ssh 登录

bash 如何尽可能快地复制文件？

Bash 新手 - 不断收到非法选项错误

bash 如何在 ubuntu 的 shell 脚本中运行 .profile

相关推荐

最近更新

标签