bash jq 通过特定键计算json中的项数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45170897/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 16:18:04  来源:igfitidea点击:

jq count the number of items in json by a specific key

jsonbashcommand-linecountjq

提问by Eleanor

The following is the first two items in my json file

以下是我的json文件中的前两项

{
"ReferringUrl": "N",
"OpenAccess": "0",
"Properties": {
    "ItmId": "1694738780"
   }
}
{
"ReferringUrl": "L",
"OpenAccess": "1",
"Properties": {
    "ItmId": "1347809133"
  }
}

I want to count the number of items by each ItmId appeared in the json. For example, items that with "ItmId" 1694738780 appears 10 times and items with "ItmId" 1347809133 appears 14 times in my json file. Then return a json like this

我想通过出现在 json 中的每个 ItmId 来计算项目的数量。例如,“ItmId”为 1694738780 的项目在我的 json 文件中出现了 10 次,而“ItmId”为 1347809133 的项目出现了 14 次。然后像这样返回一个json

{"ItemId": "1694738780",
 "Count":  10
}
{"ItemId": "1347809133",
 "Count":  14
}

I am using bash. And prefer do this totally by jq. But it's ok to use other method.

我正在使用 bash。并且更喜欢完全由 jq 完成。但是使用其他方法也可以。

Thank you!!!

谢谢!!!

回答by peak

Here's one solution (assuming the input is a stream of valid JSON objects) and that you invoke jq with the -s option:

这是一个解决方案(假设输入是有效的 JSON 对象流)并且您使用 -s 选项调用 jq:

map({ItemId: .Properties.ItmId})             # extract the ItmID values
| group_by(.ItemId)                          # group by "ItemId"
| map({ItemId: .[0].ItemId, Count: length})  # store the counts
| .[]                                        # convert to a stream

A slightly more memory-efficient approach would be to use inputsif your jq has it; but in that case, use -n instead of -s, and replace the first line above by: [inputs | {ItemId: .Properties.ItmId} ]

inputs如果您的 jq 有它,则使用内存效率稍高的方法;但在这种情况下,请使用 -n 而不是 -s,并将上面的第一行替换为:[inputs | {ItemId: .Properties.ItmId} ]

Efficient solution

高效的解决方案

The above solutions use the built-in group_by, which is convenient but leads to easily-avoided inefficients. Using the following countermakes it easy to write a very efficient solution:

上述解决方案使用了内置的group_by,虽然方便,但容易导致低效。使用以下内容counter可以轻松编写非常有效的解决方案:

def counter(stream):
  reduce stream as $s ({}; .[$s|tostring] += 1);

Using the -n command-line option, and applied as follows:

使用 -n 命令行选项,并按如下方式应用:

counter(inputs | .Properties.ItmId)

this leads to a dictionary of counts:

这导致了一个计数字典:

{
  "1694738780": 1,
  "1347809133": 1
}

Such a dictionary is probably more useful than a stream of singleton objects as envisioned by the OP, but if such as stream is needed, one can modify the above as follows:

这样的字典可能比 OP 所设想的单例对象流更有用,但是如果需要这样的流,可以按如下方式修改上述内容:

counter(inputs | .Properties.ItmId)
| to_entries[]
| {ItemId: (.key), Count: .value}

回答by skr

Using jq command

使用 jq 命令

cat json.txt | jq '.Properties .ItmId' | sort | uniq -c | awk -F " " '{print "{\"ItmId\":"  ",\"count\":" "}"}'| jq .

回答by peak

Here's a super-efficient solution -- in particular, no sorting is required. The following implementation requires a version of jq with inputsbut it is easy to adapt the program to use earlier versions of jq. Please remember to use the -n command-line option if using the following:

这是一个超级高效的解决方案——特别是,不需要排序。下面的实现需要一个 jq 版本,inputs但很容易使程序适应使用早期版本的 jq。如果使用以下命令,请记住使用 -n 命令行选项:

# Count the occurrences of distinct values of (stream|tostring).
# To avoid unwanted collisions, or to recover the exact values,
# consider using tojson
def counter(stream):
  reduce stream as $s ({}; .[$s|tostring] += 1);

counter(inputs | .Properties.ItmId)
| to_entries[]
| {ItemId: (.key), Count: .value}

回答by jq170727

Here is a variation using reduce, setpathand getpathto do the aggregation and to_entriesto do the final formatting which assumes you run jq as

这是使用reducesetpathgetpath进行聚合并使用to_entries进行最终格式化的变体,假设您将 jq 作为

jq --slurp -f query.jq < data.json

where data.jsoncontains your data and query.jqcontains

其中data.json包含您的数据,而query.jq包含

  map(.Properties.ItmId)
| reduce .[] as $i (
    {}; setpath([$i]; getpath([$i]) + 1)
  )
| to_entries | .[] | { "ItemId": .key, "Count": .value }