使用 Unix 工具解析 JSON

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1955505/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 17:22:43  来源:igfitidea点击:

Parsing JSON with Unix tools

jsonbashparsing

提问by auser

I'm trying to parse JSON returned from a curl request, like so:

我正在尝试解析从 curl 请求返回的 JSON,如下所示:

curl 'http://twitter.com/users/username.json' |
    sed -e 's/[{}]/''/g' | 
    awk -v k="text" '{n=split(
% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...
,a,","); for (i=1; i<=n; i++) print a[i]}'

The above splits the JSON into fields, for example:

上面将 JSON 拆分为字段,例如:

curl -s 'https://api.github.com/users/lambda' | jq -r '.name'

How do I print a specific field (denoted by the -v k=text)?

如何打印特定字段(由 表示-v k=text)?

回答by Brian Campbell

There are a number of tools specifically designed for the purpose of manipulating JSON from the command line, and will be a lot easier and more reliable than doing it with Awk, such as jq:

有许多专门设计用于从命令行操作 JSON 的工具,它们比使用 Awk 更容易和更可靠,例如jq

curl -s 'https://api.github.com/users/lambda' | \
    python3 -c "import sys, json; print(json.load(sys.stdin)['name'])"

You can also do this with tools that are likely already installed on your system, like Python using the jsonmodule, and so avoid any extra dependencies, while still having the benefit of a proper JSON parser. The following assume you want to use UTF-8, which the original JSON should be encoded in and is what most modern terminals use as well:

您还可以使用系统上可能已经安装的工具来执行此操作,例如使用jsonmodule 的Python ,从而避免任何额外的依赖项,同时仍然可以使用适当的 JSON 解析器。下面假设您要使用 UTF-8,原始 JSON 应该编码在其中,并且也是大多数现代终端使用的:

Python 3:

蟒蛇3:

export PYTHONIOENCODING=utf8
curl -s 'https://api.github.com/users/lambda' | \
    python2 -c "import sys, json; print json.load(sys.stdin)['name']"

Python 2:

蟒蛇2:

curl -s 'https://api.github.com/users/lambda' | jsawk -a 'return this.name'

Historical notes

历史笔记

This answer originally recommended jsawk, which should still work, but is a little more cumbersome to use than jq, and depends on a standalone JavaScript interpreter being installed which is less common than a Python interpreter, so the above answers are probably preferable:

这个答案最初推荐jsawk,它应该仍然有效,但使用起来比 麻烦一点jq,并且依赖于安装的独立 JavaScript 解释器,它比 Python 解释器更不常见,因此上述答案可能更可取:

curl 'http://twitter.com/users/username.json' | jq -r '.text'

This answer also originally used the Twitter API from the question, but that API no longer works, making it hard to copy the examples to test out, and the new Twitter API requires API keys, so I've switched to using the GitHub API which can be used easily without API keys. The first answer for the original question would be:

这个答案最初也使用了问题中的 Twitter API,但该 API 不再有效,因此很难复制示例进行测试,并且新的 Twitter API 需要 API 密钥,所以我改用了 GitHub API无需 API 密钥即可轻松使用。原始问题的第一个答案是:

grep -Po '"text":.*?[^\]",' tweets.json

回答by Brendan OConnor

To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like:

为了快速提取特定键的值,我个人喜欢使用“grep -o”,它只返回正则表达式的匹配项。例如,要从推文中获取“文本”字段,例如:

json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'

This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.)

这个正则表达式比你想象的更健壮;例如,它可以很好地处理包含嵌入逗号和转义引号的字符串。我认为通过更多的工作,您可以制作一个实际上保证提取价值的东西,如果它是原子的。(如果它有嵌套,那么正则表达式当然不能做到。)

And to further clean (albeit keeping the string's original escaping) you can use something like: | perl -pe 's/"text"://; s/^"//; s/",$//'. (I did this for this analysis.)

为了进一步清洁(虽然保持字符串的原逃逸),你可以使用这样的:| perl -pe 's/"text"://; s/^"//; s/",$//'。(我这样做是为了这个分析。)

To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but

对于所有坚持你应该使用真正的 JSON 解析器的仇恨者——是的,这对于正确性至关重要,但是

  1. To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.
  2. grep -ois orders of magnitude faster than the Python standard jsonlibrary, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because jsonis slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)
  1. 要进行真正快速的分析,例如计算值以检查数据清理错误或对数据有一个大致的了解,在命令行上敲出一些东西会更快。打开编辑器来编写脚本会让人分心。
  2. grep -o比 Python 标准json库快几个数量级,至少在为推文(每个大约 2 KB)执行此操作时是这样。我不确定这是否只是因为json速度慢(我应该在某个时候与 yajl 进行比较);但原则上,正则表达式应该更快,因为它是有限状态且更可优化,而不是必须支持递归的解析器,并且在这种情况下,花费大量 CPU 为您不关心的结构构建树。(如果有人编写了一个有限状态转换器来进行适当的(深度限制的)JSON 解析,那就太棒了!同时我们有“grep -o”。)

To write maintainable code, I always use a real parsing library. I haven't tried jsawk, but if it works well, that would address point #1.

为了编写可维护的代码,我总是使用真正的解析库。我还没有尝试过jsawk,但如果它运行良好,那将解决第 1 点。

One last, wackier, solution: I wrote a script that uses Python jsonand extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around awkthat allows named access to columns. In here: the json2tsv and tsvawk scripts. So for this example it would be:

最后一个更古怪的解决方案:我编写了一个脚本,该脚本使用 Pythonjson并将您想要的键提取到以制表符分隔的列中;然后我通过一个awk允许对列进行命名访问的包装器进行管道传输。 在这里:json2tsv 和 tsvawk 脚本。所以对于这个例子,它将是:

echo '{"hostname":"test","domainname":"example.com"}' | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hostname"]'

This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than grep -o.

这种方法没有解决 #2,比单个 Python 脚本效率更低,而且它有点脆弱:它强制对字符串值中的换行符和制表符进行规范化,以便与 awk 的以字段/记录分隔的世界视图配合使用。但它确实让你留在命令行上,比grep -o.

回答by paulkmoore

On the basis that some of the recommendations here (esp in the comments) suggested the use of Python, I was disappointed not to find an example.

基于这里的一些建议(尤其是在评论中)建议使用 Python,我很失望没有找到示例。

So, here's a one liner to get a single value from some JSON data. It assumes that you are piping the data in (from somewhere) and so should be useful in a scripting context.

所以,这里有一个单行代码,用于从一些 JSON 数据中获取单个值。它假定您正在(从某处)通过管道传输数据,因此在脚本上下文中应该很有用。

$ curl -s 'http://twitter.com/users/username.json' | python -mjson.tool

回答by jnrg

Following MartinR and Boecko's lead:

跟随 MartinR 和 Boecko 的领导:

$ curl -s 'http://twitter.com/users/username.json' | python -mjson.tool | grep my_key

That will give you an extremely grep friendly output. Very convenient:

这将为您提供非常友好的 grep 输出。很方便:

$ curl 'https://twitter.com/users/username.json' | ./jq -r '.name'

回答by jfs

You could just download jqbinary for your platformand run (chmod +x jq):

您可以为您的平台下载jq二进制文件并运行 ( chmod +x jq):

$ node -pe 'JSON.parse(process.argv[1]).foo' '{ "foo": "bar" }'
bar

It extracts "name"attribute from the json object.

"name"从 json 对象中提取属性。

jqhomepagesays it is like sedfor JSON data.

jq主页说它就像sedJSON 数据。

回答by JayQuerie.com

Using Node.js

使用 Node.js

If the system has nodeinstalled, it's possible to use the -pprint and -eevaulate script flags with JSON.parseto pull out any value that is needed.

如果系统安装了节点,则可以使用-p打印和-e评估脚本标志JSON.parse来提取所需的任何值。

A simple example using the JSON string { "foo": "bar" }and pulling out the value of "foo":

一个使用 JSON 字符串{ "foo": "bar" }并提取“foo”值的简单示例:

$ node -pe 'JSON.parse(process.argv[1]).foo' "$(cat foobar.json)"
bar

Because we have access to catand other utilities, we can use this for files:

因为我们可以访问cat和其他实用程序,所以我们可以将其用于文件:

$ node -pe 'JSON.parse(process.argv[1]).name' "$(curl -s https://api.github.com/users/trevorsenior)"
Trevor Senior

Or any other format such as an URL that contains JSON:

或任何其他格式,例如包含 JSON 的 URL:

curl -s http://twitter.com/users/username.json | \
    python -c "import json,sys;obj=json.load(sys.stdin);print obj['name'];"

回答by martinr

Use Python's JSON supportinstead of using awk!

使用Python 的 JSON 支持而不是使用 awk!

Something like this:

像这样的东西:

curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print }'

回答by Paused until further notice.

You've asked how to shoot yourself in the foot and I'm here to provide the ammo:

你问过如何用脚射击自己,我在这里提供弹药:

function getJsonVal () { 
    python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)))"; 
}

You could use tr -d '{}'instead of sed. But leaving them out completely seems to have the desired effect as well.

您可以使用tr -d '{}'代替sed. 但是将它们完全排除似乎也有预期的效果。

If you want to strip off the outer quotes, pipe the result of the above through sed 's/\(^"\|"$\)//g'

如果您想去除外部引号,请将上述结果通过管道传输 sed 's/\(^"\|"$\)//g'

I think others have sounded sufficient alarm. I'll be standing by with a cell phone to call an ambulance. Fire when ready.

我认为其他人已经敲响了足够的警钟。我会拿着手机待命打电话叫救护车。准备好后开火。

回答by Joe Heyming

Using Bash with Python

在 Python 中使用 Bash

Create a bash function in your .bash_rc file

在 .bash_rc 文件中创建一个 bash 函数

$ curl 'http://twitter.com/users/username.json' | getJsonVal "['text']"
My status
$ 

Then

然后

function getJsonVal() {
   if [ \( $# -ne 1 \) -o \( -t 0 \) ]; then
       cat <<EOF
Usage: getJsonVal 'key' < /tmp/
 -- or -- 
 cat /tmp/input | getJsonVal 'key'
EOF
       return;
   fi;
   python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)))";
}

Here is the same function, but with error checking.

这是相同的功能,但带有错误检查。

$ echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' |  getJsonVal "['foo']['a'][1]"
2

Where $# -ne 1 makes sure at least 1 input, and -t 0 make sure you are redirecting from a pipe.

其中 $# -ne 1 确保至少有 1 个输入, -t 0 确保您从管道重定向。

The nice thing about this implementation is that you can access nested json values and get json in return! =)

这个实现的好处是你可以访问嵌套的 json 值并得到 json 作为回报!=)

Example:

例子:

function getJsonVal () { 
    python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin), sort_keys=True, indent=4))"; 
}

$ echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' |  getJsonVal "['foo']"
{
    "a": [
        1, 
        2, 
        3
    ], 
    "bar": "baz"
}

If you want to be really fancy, you could pretty print the data:

如果你真的很喜欢,你可以漂亮地打印数据:

#!/bin/bash
. ticktick.sh

``  
  people = { 
    "Writers": [
      "Rod Serling",
      "Charles Beaumont",
      "Richard Matheson"
    ],  
    "Cast": {
      "Rod Serling": { "Episodes": 156 },
      "Martin Landau": { "Episodes": 2 },
      "William Shatner": { "Episodes": 2 } 
    }   
  }   
``  

function printDirectors() {
  echo "  The ``people.Directors.length()`` Directors are:"

  for director in ``people.Directors.items()``; do
    printf "    - %s\n" ${!director}
  done
}   

`` people.Directors = [ "John Brahm", "Douglas Heyes" ] ``
printDirectors

newDirector="Lamont Johnson"
`` people.Directors.push($newDirector) ``
printDirectors

echo "Shifted: "``people.Directors.shift()``
printDirectors

echo "Popped: "``people.Directors.pop()``
printDirectors

回答by CoolAJ86

TickTickis a JSON parser written in bash (<250 lines of code)

TickTick是一个用 bash 编写的 JSON 解析器(<250 行代码)

Here's the author's snippit from his article, Imagine a world where Bash supports JSON:

这是作者在他的文章中的片段,想象一个 Bash 支持 JSON 的世界

##代码##