Linux grep:组捕获

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8602848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 03:47:12  来源:igfitidea点击:

grep: group capturing

regexlinuxbashgrep

提问by lstipakov

I have following string:

我有以下字符串:

{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}

and I need to get value of "scheme version", which is 1234 in this example.

我需要获取“方案版本”的值,在本例中为 1234。

I have tried

我试过了

grep -Eo "\"scheme_version\":(\w*)"

however it returns

但是它返回

"scheme_version":1234

How can I make it? I know I can add sedcall, but I would prefer to do it with single grep.

我怎样才能做到?我知道我可以添加sed调用,但我更愿意使用单个 grep 来完成。

采纳答案by potong

This might work for you:

这可能对你有用:

echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}//p'
1234

Sorry it's not grep, so disregard this solution if you like.

抱歉,它不是grep,所以如果您愿意,请忽略此解决方案。

Or stick with grep and add:

或者坚持使用 grep 并添加:

grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2

回答by SiegeX

You'll need to use a look behind assertion so that it isn't included in the match:

您需要使用断言背后的外观,以便它不包含在匹配中:

grep -Po '(?<=scheme_version":)[0-9]+'

grep -Po '(?<=scheme_version":)[0-9]+'

回答by Marc O'Morain

I would recommend that you use jqfor the job. jq is a command-line JSON processor.

我建议你使用jq来完成这项工作。jq 是一个命令行 JSON 处理器。

$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}

$ cat tmp | jq .scheme_version
1234

回答by kris.zhang

You can do this:

你可以这样做:

$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print }' | tr -d '}'

回答by ClarkZinzow

As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version":with the \Kescape sequence. E.g.,

作为 SiegeX 建议的正向后视方法的替代方法,您可以scheme_version":使用\K转义序列将匹配起点直接重置为 after 。例如,

$ grep -Po 'scheme_version":\K[0-9]+'

This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.

这在匹配后重新启动匹配过程scheme_version":,并且往往比积极的lookbehind具有更好的性能。在 regexp101 上比较两者表明,重置匹配启动方法需要 37 步和 1 毫秒,而正向后视方法需要 194 步和 21 毫秒。

You can compare the performance yourself on regex101and you can read more about resetting the match starting point in the PCRE documentation.

您可以自己在regex101上比较性能,并且可以在PCRE 文档中阅读有关重置匹配起点的更多信息。

回答by kenorb

To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.

为了避免使用grepGNU 中grep可用但在BSD 版本中不可用的 PCRE 功能,另一种方法是使用ripgrep,例如

$ rg -o 'scheme_version.?:(\d+)' -r '' <file.json 
1234

-rCapture group indices (e.g., $5) and names (e.g., $foo).

-r捕获组索引(例如,$5)和名称(例如,$foo)。

Another example with Python and json.toolmodulewhich can validate and pretty-print:

另一个可以验证和漂亮打印的Python 和json.tool模块示例:

$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r ''
1234

Related: Can grep output only specified groupings that match?

相关:grep 只能输出匹配的指定分组吗?

回答by Alexandre Hamon

Improving @potong's answer that works only to get "scheme_version", you can use this expression :

改进@potong 的答案,该答案仅适用于“scheme_version”,您可以使用以下表达式:

$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*//p'
scheme_version

$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*//p'
4-cad1842a7646b4497066e09c3788e724

$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*//p'
1234