在 bash/awk 中提取大括号之间的值的最佳方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11978892/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 03:03:00  来源:igfitidea点击:

What is the optimal way to extract values between braces in bash/awk?

bashshellawk

提问by kev

I have the output in this format:

我有这种格式的输出:

Infosome - infotwo: (29333) - data-info-ids: (33389, 94934)

I want to extract the last two numbers in the last pair of braces. Some times there is only a single number in the last pair of braces.

我想提取最后一对大括号中的最后两个数字。有时,最后一对大括号中只有一个数字。

This is the code I used.

这是我使用的代码。

echo "Infosome - infotwo: (29333) - data-info-ids: (33389, 94934)" | \
  tr "," " " | tr "(" " " | tr ")" " " | awk -F: '{print }'

Is a more clean way to extract the values? or a more optimal way?

是一种更干净的方法来提取值吗?或更优的方式?

回答by kev

Try this:

尝试这个:

awk -F '[()]' '{print $(NF-1)}' input | tr -d ,

It's kind of refactoring of your command.

这是对您的命令的一种重构。

回答by Levon

 awk -F\( '{gsub("[,)]", " ", $NF); print $NF}' input

will give

会给

 33389  94934 

I am a bit unclear about the meaning of "optimal"/"professional" in this problem's context, but this only uses one command/tool, not sure if that qualifies.

我有点不清楚在这个问题的上下文中“最佳”/“专业”的含义,但这仅使用一个命令/工具,不确定是否符合条件。

Orbuilding on @kev's approach (but not needing trto eliminate the comma):

或者基于@kev 的方法(但不需要tr消除逗号):

awk -F'[(,)]' '{print , }' input

outputs:

输出:

33389  94934

回答by ghoti

This can also be done in pure bash. Assuming the text always looks like the sample in the question, the following should work:

这也可以在纯 bash 中完成。假设文本总是看起来像问题中的示例,以下应该有效:

$ text="Infosome - infotwo: (29333) - data-info-ids: (33389, 94934)"
$ result="${text/*(}"
$ echo ${result//[,)]}
33389 94934

This uses shell "parameter expansion" (which you can search for in bash's man page) to strip the string in much the same way you did using tr. Strictly speaking, the quotes in the second line are not necessary, but they help with StackOverflow syntax highlighting. :-)

这使用 shell“参数扩展”(您可以在 bash 的手册页中搜索)以与使用tr. 严格来说,第二行中的引号不是必需的,但它们有助于 StackOverflow 语法突出显示。:-)

You could alternately make this a little more flexible by looking for the actual fieldyou're interested in. If you're using GNU awk, you can specify RS with multiple characters:

您也可以通过查找您感兴趣的实际字段来使其更加灵活。如果您使用的是 GNU awk,则可以使用多个字符指定 RS:

$ gawk -vRS=" - " -vFS=": *" '
  { f[]=; }
  END {
    print f["data-info-ids"];
    # Or you could strip the non-numeric characters to get just numbers. 
    #print gensub(/[^0-9 ]/,"","g",f["data-info-ids"]);
  }' <<<"$text"

I prefer this way, because it actually interprets the input data for what it is -- structured text representing some sort of array.

我更喜欢这种方式,因为它实际上解释了输入数据的本质——表示某种数组的结构化文本。