bash 在bash中用awk/sed解析json得到键值对

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18908554/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 06:38:15  来源:igfitidea点击:

Parsing json with awk/sed in bash to get key value pair

jsonbashshellsedawk

提问by Aman Deep Gautam

I have read many existing questions at SO but none of them answers what I am looking for. I know it is difficult to parse json in bash using sed/awk but I only need a few key-value pairs per record out of a whole list of key-value pairs per record. I want to do this because it will be faster as the main JSON is pretty big with millions of records.

我在 SO 上阅读了许多现有问题,但没有一个回答我正在寻找的问题。我知道使用 sed/awk 在 bash 中解析 json 很困难,但我只需要每个记录的整个键值对列表中的几个键值对。我想这样做是因为它会更快,因为主要的 JSON 非常大,有数百万条记录。

The JSON format is like following:

JSON 格式如下:

{
    "documents":
    [
        {
            "title":"a",   //needed
            "description":"b",  //needed
            "id":"c",  //needed
            ....(some more:not useful)....
            "conversation":
            [
                {
                    "message":"",
                    "id":"d",   //not needed
                    .....(some more)....
                    "createDate":"e",   //not needed
                },
                ...(some more messages)....
            ],
            "createDate":"f",  //needed
            ....(many more labels).....
        }
    ],
    ....(some more global attributes)....
}

Now for this I require attributes which are marked as needed but their common key make it a problem to get by simple sed/awk. Could anyone suggest if we can do it with sed/awk. if possible any help to achieve the same would be appreciated.

现在为此,我需要标记为需要的属性,但它们的公共键使得通过简单的 sed/awk 获取成为问题。任何人都可以建议我们是否可以使用 sed/awk 来做到这一点。如果可能的话,我们将不胜感激。

P.S.: I know about jsawkbut I do not want to introduce any dependency, so if possible please suggest usage of sed/awk.

PS:我知道jsawk但我不想引入任何依赖项,因此如果可能,请建议使用 sed/awk。

EDIT: Multiple extries of the format given below(as in document we have a list)

编辑:下面给出的格式的多个 extries(如在文档中我们有一个列表)

"title":"a",
"description":"b"
"id":"c"
"createDate":"f"

EDIT: The JSON is without any spaces. It has been formated for readability.

编辑:JSON 没有任何空格。它已被格式化以提高可读性。

回答by Eduardo A. Bustamante López

I would advise that you use 'jq', or a real JSON parser. You can't "parse" JSON with arbitrary regular expressions. You could hack something with awk, but that will break easily if your input has a form you didn't anticipate.

我建议您使用“jq”或真正的 JSON 解析器。您不能使用任意正则表达式“解析”JSON。你可以用 awk 破解一些东西,但是如果你的输入有你没有预料到的形式,这很容易中断。

So, the answer is, introduce a cheap dependency (jq, or similar tool), and script around that. Unless you're running this script in a router or an embedded computer, chances are you can easily install jq.

因此,答案是,引入一个廉价的依赖项(jq 或类似工具),并围绕它编写脚本。除非您在路由器或嵌入式计算机中运行此脚本,否则您很可能可以轻松安装 jq。

回答by konsolebox

If the key characters [, and {, }, and ]are always isolated in every line this would work:

如果关键字符[, 和{, }, 和]总是在每一行中隔离,这将起作用:

#!/usr/bin/awk -f

function walk(level, end) {
    while (getline > 0) {
        if (level && $NF ~ end) {
            return
        } 
        if ($NF == "{") {
            walk(level + 1, "},?")
        } else if ($NF == "[") {
            walk(level + 1, "],?")
        } else if (level == 3 && match(
{
"documents":
[
{
"title":"a",   //needed
"description":"b",  //needed
"id":"c",  //needed
....(some more:not useful)....
"conversation":
[
{
"message":"",
"id":"d",   //not needed
.....(some more)....
"createDate":"e",   //not needed
},
...(some more messages)....
],
"createDate":"f",  //needed
....(many more labels).....
}
],
....(some more global attributes)....
}
, /"(title|description|id|createDate)":"[^"]*"/)) { print substr(
"title":"a"
"description":"b"
"id":"c"
"createDate":"f"
, RSTART, RLENGTH) } } } BEGIN { walk(0) exit }

Input:

输入:

awk '/^ {12}"title/
/^ {12}"description/
/^ {12}"id/
/^ {12}"createDate/' input_file.json

Output:

输出:

##代码##

回答by Liz Bennett

Well, if you're going to use a regex to parse JSON, which will by nature be quick, dirty and heavily reliant on the exact syntax of the input file, you could write something that relies on the amount of white space occurring before the key value pairs you're interested in. Depending on the kind of output you're looking for, you could use something along the lines of:

好吧,如果你打算使用正则表达式来解析 JSON,它本质上是快速、脏的并且严重依赖于输入文件的确切语法,你可以编写一些依赖于出现在您感兴趣的键值对。根据您要查找的输出类型,您可以使用以下内容:

##代码##

Not great, but it does the trick on your example input...

不是很好,但它可以解决您的示例输入问题...