如何使用 jq 将任意简单的 JSON 转换为 CSV?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32960857/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert arbitrary simple JSON to CSV using jq?
提问by outis
Using jq, how can arbitrary JSON encoding an array of shallow objects be converted to CSV?
使用jq,如何将任意 JSON 编码的浅层对象数组转换为 CSV?
There are plenty of Q&As on this site that cover specific data models which hard-code the fields, but answers to this question should work given any JSON, with the only restriction that it's an array of objects with scalar properties (no deep/complex/sub-objects, as flattening these is another question). The result should contain a header row giving the field names. Preference will be given to answers that preserve the field order of the first object, but it's not a requirement. Results may enclose all cells with double-quotes, or only enclose those that require quoting (e.g. 'a,b').
这个站点上有很多问答涵盖了对字段进行硬编码的特定数据模型,但是这个问题的答案应该适用于任何 JSON,唯一的限制是它是具有标量属性的对象数组(没有深度/复杂/子对象,因为扁平化这些是另一个问题)。结果应包含给出字段名称的标题行。将优先考虑保留第一个对象的字段顺序的答案,但这不是必需的。结果可以用双引号将所有单元格括起来,或者只将那些需要引用的单元格括起来(例如'a,b')。
Examples
例子
Input:
[ {"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"}, {"code": "AB", "name": "Alberta", "level":"province", "country": "CA"}, {"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"}, {"code": "AK", "name": "Alaska", "level":"state", "country": "US"} ]Possible output:
code,name,level,country NSW,New South Wales,state,AU AB,Alberta,province,CA ABD,Aberdeenshire,council area,GB AK,Alaska,state,USPossible output:
"code","name","level","country" "NSW","New South Wales","state","AU" "AB","Alberta","province","CA" "ABD","Aberdeenshire","council area","GB" "AK","Alaska","state","US"Input:
[ {"name": "bang", "value": "!", "level": 0}, {"name": "letters", "value": "a,b,c", "level": 0}, {"name": "letters", "value": "x,y,z", "level": 1}, {"name": "bang", "value": "\"!\"", "level": 1} ]Possible output:
name,value,level bang,!,0 letters,"a,b,c",0 letters,"x,y,z",1 bang,"""!""",0Possible output:
"name","value","level" "bang","!","0" "letters","a,b,c","0" "letters","x,y,z","1" "bang","""!""","1"
输入:
[ {"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"}, {"code": "AB", "name": "Alberta", "level":"province", "country": "CA"}, {"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"}, {"code": "AK", "name": "Alaska", "level":"state", "country": "US"} ]可能的输出:
code,name,level,country NSW,New South Wales,state,AU AB,Alberta,province,CA ABD,Aberdeenshire,council area,GB AK,Alaska,state,US可能的输出:
"code","name","level","country" "NSW","New South Wales","state","AU" "AB","Alberta","province","CA" "ABD","Aberdeenshire","council area","GB" "AK","Alaska","state","US"输入:
[ {"name": "bang", "value": "!", "level": 0}, {"name": "letters", "value": "a,b,c", "level": 0}, {"name": "letters", "value": "x,y,z", "level": 1}, {"name": "bang", "value": "\"!\"", "level": 1} ]可能的输出:
name,value,level bang,!,0 letters,"a,b,c",0 letters,"x,y,z",1 bang,"""!""",0可能的输出:
"name","value","level" "bang","!","0" "letters","a,b,c","0" "letters","x,y,z","1" "bang","""!""","1"
回答by
First, obtain an array containing all the different object property names in your object array input. Those will be the columns of your CSV:
首先,获取一个包含对象数组输入中所有不同对象属性名称的数组。这些将是您的 CSV 的列:
(map(keys) | add | unique) as $cols
Then, for each object in the object array input, map the column names you obtained to the corresponding properties in the object. Those will be the rows of your CSV.
然后,对于对象数组输入中的每个对象,将您获得的列名称映射到对象中的相应属性。这些将是您的 CSV 的行。
map(. as $row | $cols | map($row[.])) as $rows
Finally, put the column names before the rows, as a header for the CSV, and pass the resulting row stream to the @csvfilter.
最后,将列名放在行之前,作为 CSV 的标题,并将生成的行流传递给@csv过滤器。
$cols, $rows[] | @csv
All together now. Remember to use the -rflag to get the result as a raw string:
现在都在一起了。请记住使用-r标志将结果作为原始字符串获取:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv'
回答by outis
The Skinny
瘦子
jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]])[] | @csv'
or:
或者:
jq -r '(.[0] | keys_unsorted) as $keys | ([$keys] + map([.[ $keys[] ]])) [] | @csv'
The Details
细节
Aside
在旁边
Describing the details is tricky because jq is stream-oriented, meaning it operates on a sequence of JSON data, rather than a single value. The input JSON stream gets converted to some internal type which is passed through the filters, then encoded in an output stream at program's end. The internal type isn't modeled by JSON, and doesn't exist as a named type. It's most easily demonstrated by examining the output of a bare index (.[]) or the comma operator (examining it directly could be done with a debugger, but that would be in terms of jq's internal data types, rather than the conceptual data types behind JSON).
描述细节很棘手,因为 jq 是面向流的,这意味着它对一系列 JSON 数据而不是单个值进行操作。输入 JSON 流被转换为某种内部类型,该类型通过过滤器,然后在程序结束时在输出流中编码。内部类型不是由 JSON 建模的,也不作为命名类型存在。通过检查裸索引 ( .[]) 或逗号运算符的输出最容易证明(可以使用调试器直接检查它,但这将是根据 jq 的内部数据类型,而不是 JSON 背后的概念数据类型) .
$ jq -c '.[]' <<<'["a", "b"]' "a" "b" $ jq -cn '"a", "b"' "a" "b"
$ jq -c '.[]' <<<'["a", "b"]' "a" "b" $ jq -cn '"a", "b"' "a" "b"
Note that the output isn't an array (which would be ["a", "b"]). Compact output (the -coption) shows that each array element (or argument to the ,filter) becomes a separate object in the output (each is on a separate line).
请注意,输出不是数组(应该是["a", "b"])。紧凑输出(-c选项)显示每个数组元素(或,过滤器的参数)成为输出中的一个单独对象(每个在单独的行上)。
A stream is like a JSON-seq, but uses newlines rather than RSas an output separator when encoded. Consequently, this internal type is referred to by the generic term "sequence" in this answer, with "stream" being reserved for the encoded input and output.
流类似于JSON-seq,但在编码时使用换行符而不是RS作为输出分隔符。因此,此内部类型在本答案中由通用术语“序列”引用,“流”保留用于编码的输入和输出。
Constructing the Filter
构建过滤器
The first object's keys can be extracted with:
可以使用以下方法提取第一个对象的键:
.[0] | keys_unsorted
Keys will generally be kept in their original order, but preserving the exact order isn't guaranteed. Consequently, they will need to be used to index the objects to get the values in the same order. This will also prevent values being in the wrong columns if some objects have a different key order.
密钥通常会按其原始顺序保存,但不能保证保留确切顺序。因此,它们将需要用于索引对象以按相同顺序获取值。如果某些对象具有不同的键顺序,这也将防止值出现在错误的列中。
To both output the keys as the first row and make them available for indexing, they're stored in a variable. The next stage of the pipeline then references this variable and uses the comma operator to prepend the header to the output stream.
为了将键作为第一行输出并使它们可用于索引,它们存储在一个变量中。管道的下一阶段然后引用此变量并使用逗号运算符将标头添加到输出流中。
(.[0] | keys_unsorted) as $keys | $keys, ...
The expression after the comma is a little involved. The index operator on an object can take a sequence of strings (e.g. "name", "value"), returning a sequence of property values for those strings. $keysis an array, not a sequence, so []is applied to convert it to a sequence,
逗号后面的表达有点复杂。对象上的索引运算符可以采用字符串序列(例如"name", "value"),返回这些字符串的属性值序列。$keys是一个数组,而不是一个序列,因此[]被应用于将其转换为一个序列,
$keys[]
which can then be passed to .[]
然后可以传递给 .[]
.[ $keys[] ]
This, too, produces a sequence, so the array constructor is used to convert it to an array.
这也会产生一个序列,因此使用数组构造函数将其转换为数组。
[.[ $keys[] ]]
This expression is to be applied to a single object. map()is used to apply it to all objects in the outer array:
此表达式将应用于单个对象。map()用于将其应用于外部数组中的所有对象:
map([.[ $keys[] ]])
Lastly for this stage, this is converted to a sequence so each item becomes a separate row in the output.
最后在这个阶段,它被转换为一个序列,因此每个项目成为输出中的一个单独的行。
map([.[ $keys[] ]])[]
Why bundle the sequence into an array within the maponly to unbundle it outside? mapproduces an array; .[ $keys[] ]produces a sequence. Applying mapto the sequence from .[ $keys[] ]would produce an array of sequences of values, but since sequences aren't a JSON type, so you instead get a flattened array containing all the values.
为什么将序列捆绑到一个数组中,map只能在外面将其解开?map产生一个数组;.[ $keys[] ]产生一个序列。应用于map序列 from.[ $keys[] ]将产生一个值序列数组,但由于序列不是 JSON 类型,因此您会得到一个包含所有值的扁平数组。
["NSW","AU","state","New South Wales","AB","CA","province","Alberta","ABD","GB","council area","Aberdeenshire","AK","US","state","Alaska"]
The values from each object need to be kept separate, so that they become separate rows in the final output.
每个对象的值需要分开保存,以便它们在最终输出中成为单独的行。
Finally, the sequence is passed through @csvformatter.
最后,序列通过@csv格式化程序。
Alternate
备用
The items can be separated late, rather than early. Instead of using the comma operator to get a sequence (passing a sequence as the right operand), the header sequence ($keys) can be wrapped in an array, and +used to append the array of values. This still needs to be converted to a sequence before being passed to @csv.
物品可以晚点分开,而不是早点分开。不使用逗号运算符来获取序列(将序列作为正确的操作数传递),头序列 ( $keys) 可以包装在一个数组中,并+用于附加值数组。这仍然需要在传递给 之前转换为序列@csv。
回答by TJR
The following filter is slightly different in that it will ensure every value is converted to a string. (Note: use jq 1.5+)
以下过滤器略有不同,因为它将确保每个值都转换为字符串。(注意:使用 jq 1.5+)
# For an array of many objects
jq -f filter.jq (file)
# For many objects (not within array)
jq -s -f filter.jq (file)
Filter: filter.jq
筛选: filter.jq
def tocsv($x):
$x
|(map(keys)
|add
|unique
|sort
) as $cols
|map(. as $row
|$cols
|map($row[.]|tostring)
) as $rows
|$cols,$rows[]
| @csv;
tocsv(.)
回答by Jeff Mercado
I created a function that outputs an array of objects or arrays to csv with headers. The columns would be in the order of the headers.
我创建了一个函数,该函数将一组对象或数组输出到带有标题的 csv。列将按照标题的顺序排列。
def to_csv($headers):
def _object_to_csv:
($headers | @csv),
(.[] | [.[$headers[]]] | @csv);
def _array_to_csv:
($headers | @csv),
(.[][:$headers|length] | @csv);
if .[0]|type == "object"
then _object_to_csv
else _array_to_csv
end;
So you could use it like so:
所以你可以像这样使用它:
to_csv([ "code", "name", "level", "country" ])
回答by peak
This variant of Santiago's program is also safe but ensures that the key names in the first object are used as the first column headers, in the same order as they appear in that object:
Santiago 程序的这个变体也是安全的,但确保第一个对象中的键名用作第一列标题,其顺序与它们在该对象中出现的顺序相同:
def tocsv:
if length == 0 then empty
else
(.[0] | keys_unsorted) as $keys
| (map(keys) | add | unique) as $allkeys
| ($keys + ($allkeys - $keys)) as $cols
| ($cols, (.[] as $row | $cols | map($row[.])))
| @csv
end ;
tocsv

