Linux grep:组捕获
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8602848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
grep: group capturing
提问by lstipakov
I have following string:
我有以下字符串:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
我需要获取“方案版本”的值,在本例中为 1234。
I have tried
我试过了
grep -Eo "\"scheme_version\":(\w*)"
however it returns
但是它返回
"scheme_version":1234
How can I make it? I know I can add sedcall, but I would prefer to do it with single grep.
我怎样才能做到?我知道我可以添加sed调用,但我更愿意使用单个 grep 来完成。
采纳答案by potong
This might work for you:
这可能对你有用:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}//p'
1234
Sorry it's not grep, so disregard this solution if you like.
抱歉,它不是grep,所以如果您愿意,请忽略此解决方案。
Or stick with grep and add:
或者坚持使用 grep 并添加:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
回答by SiegeX
You'll need to use a look behind assertion so that it isn't included in the match:
您需要使用断言背后的外观,以便它不包含在匹配中:
grep -Po '(?<=scheme_version":)[0-9]+'
grep -Po '(?<=scheme_version":)[0-9]+'
回答by Marc O'Morain
回答by kris.zhang
You can do this:
你可以这样做:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print }' | tr -d '}'
回答by ClarkZinzow
As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version":
with the \K
escape sequence. E.g.,
作为 SiegeX 建议的正向后视方法的替代方法,您可以scheme_version":
使用\K
转义序列将匹配起点直接重置为 after 。例如,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":
, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
这在匹配后重新启动匹配过程scheme_version":
,并且往往比积极的lookbehind具有更好的性能。在 regexp101 上比较两者表明,重置匹配启动方法需要 37 步和 1 毫秒,而正向后视方法需要 194 步和 21 毫秒。
You can compare the performance yourself on regex101and you can read more about resetting the match starting point in the PCRE documentation.
回答by kenorb
To avoid using grep
s PCRE feature which is available in GNU grep
, but not in BSD version, another method is to use ripgrep
, e.g.
为了避免使用grep
在GNU 中grep
可用但在BSD 版本中不可用的 PCRE 功能,另一种方法是使用ripgrep
,例如
$ rg -o 'scheme_version.?:(\d+)' -r '' <file.json
1234
-r
Capture group indices (e.g.,$5
) and names (e.g.,$foo
).
-r
捕获组索引(例如,$5
)和名称(例如,$foo
)。
Another example with Python and json.tool
modulewhich can validate and pretty-print:
另一个可以验证和漂亮打印的Python 和json.tool
模块示例:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r ''
1234
Related: Can grep output only specified groupings that match?
回答by Alexandre Hamon
Improving @potong's answer that works only to get "scheme_version", you can use this expression :
改进@potong 的答案,该答案仅适用于“scheme_version”,您可以使用以下表达式:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*//p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*//p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*//p'
1234