在 bash 中将 CSV 转换为 JSON

Question

提问by HydrUra

Trying to convert a CSV file into a JSON

尝试将 CSV 文件转换为 JSON

Here is two sample lines :

这是两个示例行：

-21.3214077;55.4851413;Ruizia cordata
-21.3213078;55.4849803;Cossinia pinnata

I would like to get something like :

我想得到类似的东西：

"occurrences": [
                 {
                "position": [-21.3214077, 55.4851413],
                "taxo": {
                    "espece": "Ruizia cordata"
                 },
                 ...
             }]

Here is my script :

这是我的脚本：

    echo '"occurences": [ '

cat se.csv | while read -r line
  do
      IFS=';' read -r -a array <<< $line;
      echo -n -e '{ "position": [' ${array[0]}
      echo -n -e ',' ${array[1]} ']'
      echo -e ', "taxo": {"espece":"' ${array[2]} '"'
done
echo "]";

I get really strange results :

我得到了非常奇怪的结果：

   "occurences": [ 
 ""position": [ -21.3214077, 55.4851413 ], "taxo": {"espece":" Ruizia cordata
 ""position": [ -21.3213078, 55.4849803 ], "taxo": {"espece":" Cossinia pinnata

What is wrong with my code ?

我的代码有什么问题？

Answer 1

采纳答案by Charles Duffy

The right tool for this job is jq.

这项工作的正确工具是jq.

jq -Rsn '
  {"occurrences":
    [inputs
     | . / "\n"
     | (.[] | select(length > 0) | . / ";") as $input
     | {"position": [$input[0], $input[1]], "taxo": {"espece": $input[2]}}]}
' <se.csv

emits, given your input:

根据您的输入发出：

{
  "occurences": [
    {
      "position": [
        "-21.3214077",
        "55.4851413"
      ],
      "taxo": {
        "espece": "Ruizia cordata"
      }
    },
    {
      "position": [
        "-21.3213078",
        "55.4849803"
      ],
      "taxo": {
        "espece": "Cossinia pinnata"
      }
    }
  ]
}

By the way, a less-buggy version of your original script might look like:

顺便说一句，原始脚本的错误较少的版本可能如下所示：

#!/usr/bin/env bash

items=( )
while IFS=';' read -r lat long pos _; do
  printf -v item '{ "position": [%s, %s], "taxo": {"espece": "%s"}}' "$lat" "$long" "$pos"
  items+=( "$item" )
done <se.csv

IFS=','
printf '{"occurrences": [%s]}\n' "${items[*]}"

Note:

笔记：

There's absolutely no point using catto pipe into a loop (and good reasons not to); thus, we're using a redirection (<) to open the file directly as the loop's stdin.
readcan be passed a list of destination variables; there's thus no need to read into an array (or firstto read into a string, and then to generate a heresting and to read from that into an array). The _at the end ensures that extra columns are discarded (by putting them into the dummy variable named _) rather than appended to pos.
"${array[*]}"generates a string by concatenating elements of arraywith the character in IFS; we can thus use this to ensure that commas are present in the output only when they're needed.
printfis used in preference to echo, as advised in the APPLICATION USAGE section of the specification for echoitself.
This is still inherently buggy since it's generating JSON via string concatenation. Don't use it.

使用cat管道进入循环绝对没有意义（并且有充分的理由不这样做）；因此，我们使用重定向 ( <) 将文件直接作为循环的标准输入打开。
read可以传递目标变量列表；因此不需要读入数组（或首先读入一个字符串，然后生成一个heresting并从中读入一个数组）。所述_在端部保证了额外的列将被抛弃（通过将它们到名为虚变量_），而不是附加到pos。
"${array[*]}"通过将的元素array与中的字符连接起来生成一个字符串IFS；因此，我们可以使用它来确保逗号仅在需要时出现在输出中。
printf优先于使用echo，如在所述应用使用段建议的规范echo本身。
这在本质上仍然存在缺陷，因为它通过字符串连接生成 JSON。不要使用它。

Answer 2

回答by Ondra ?i?ka

Here is an article on the subject: https://infiniteundo.com/post/99336704013/convert-csv-to-json-with-jq

这是一篇关于这个主题的文章：https: //infiniteundo.com/post/99336704013/convert-csv-to-json-with-jq

It also uses JQ, but a bit different approach using split()and map().

它还使用 JQ，但使用split()和的方法略有不同map()。

jq --slurp --raw-input \
   'split("\n") | .[1:] | map(split(";")) |
      map({
         "position": [.[0], .[1]],
         "taxo": {
             "espece": .[2]
          }
      })' \
  input.csv > output.json

It doesn't handle separator escaping, though.

但是，它不处理分隔符转义。

Answer 3

回答by Raman

The accepted answer uses jqto parse the input. This works but jqdoesn't handle escapes i.e. input from a CSV produced from Excel or similar tools is quoted like this:

接受的答案用于jq解析输入。这有效但jq不处理转义，即从 Excel 或类似工具生成的 CSV 输入的引用如下：

foo,"bar,baz",gaz

will result in the incorrect output, as jq will see 4 fields, not 3.

将导致错误的输出，因为 jq 将看到 4 个字段，而不是 3 个。

One option is to use tab-separated values instead of comma (as long as your input data doesn't contain tabs!), along with the accepted answer.

一种选择是使用制表符分隔值而不是逗号（只要您的输入数据不包含制表符！），以及接受的答案。

Another option is to combine your tools, and use the best tool for each part: a CSV parser for reading the input and turning it into JSON, and jqfor transforming the JSON into the target format.

另一种选择是组合您的工具，并为每个部分使用最好的工具：CSV 解析器，用于读取输入并将其转换为 JSON，并将jqJSON 转换为目标格式。

The python-based csvkitwill intelligently parse the CSV, and comes with a tool csvjsonwhich will do a much better job of turning the CSV into JSON. This can then be piped through jq to convert the flat JSON output by csvkit into the target form.

基于 python 的csvkit将智能地解析 CSV，并附带一个工具csvjson，可以更好地将 CSV 转换为 JSON。然后可以通过 jq 将其通过管道传输，以将 csvkit 的平面 JSON 输出转换为目标形式。

With the data provided by the OP, for the desired output, this as as simple as:

使用 OP 提供的数据，对于所需的输出，这很简单：

csvjson --no-header-row  |
  jq '.[] | {occurrences: [{ position: [.a, .b], taxo: {espece: .c}}]}'

Note that csvjson automatically detects ;as the delimiter, and without a header row in the input, assigns the json keys as a, b, and c.

需要注意的是csvjson自动检测;作为分隔符，并且没有标题行中的输入，分配JSON键作为a，b，和c。

The same also applies to writing toCSV files -- csvkitcan read a JSON array or new-line delimited JSON, and intelligently output a CSV via in2csv.

这同样也适用于写到CSV文件-csvkit可以读取一个JSON数组或新线过分隔的JSON，并智能输出CSV in2csv。

Answer 4

回答by Jeff Mercado

If you want to go crazy, you can write a parser using jq. Here's my implementation which can be thought of as the inverse of the @csvfilter. Throw this into your .jq file.

如果你想发疯，你可以用 jq 写一个解析器。这是我的实现，它可以被认为是@csv过滤器的逆。把它放到你的 .jq 文件中。

def do_if(pred; update):
    if pred then update else . end;
def _parse_delimited($_delim; $_quot; $_nl; $_skip):
    [($_delim, $_quot, $_nl, $_skip)|explode[]] as [$delim, $quot, $nl, $skip] |
    [0,1,2,3,4,5] as [$s_start,$s_next_value,$s_read_value,$s_read_quoted,$s_escape,$s_final] |
    def _append($arr; $value):
        $arr + [$value];
    def _do_start($c):
        if $c == $nl then
            [$s_start, null, null, _append(.[3]; [""])]
        elif $c == $delim then
            [$s_next_value, null, [""], .[3]]
        elif $c == $quot then
            [$s_read_quoted, [], [], .[3]]
        else
            [$s_read_value, [$c], [], .[3]]
        end;
    def _do_next_value($c):
        if $c == $nl then
            [$s_start, null, null, _append(.[3]; _append(.[2]; ""))]
        elif $c == $delim then
            [$s_next_value, null, _append(.[2]; ""), .[3]]
        elif $c == $quot then
            [$s_read_quoted, [], .[2], .[3]]
        else
            [$s_read_value, [$c], .[2], .[3]]
        end;
    def _do_read_value($c):
        if $c == $nl then
            [$s_start, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        elif $c == $delim then
            [$s_next_value, null, _append(.[2]; .[1]|implode), .[3]]
        else
            [$s_read_value, _append(.[1]; $c), .[2], .[3]]
        end;
    def _do_read_quoted($c):
        if $c == $quot then
            [$s_escape, .[1], .[2], .[3]]
        else
            [$s_read_quoted, _append(.[1]; $c), .[2], .[3]]
        end;
    def _do_escape($c):
        if $c == $nl then
            [$s_start, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        elif $c == $delim then
            [$s_next_value, null, _append(.[2]; .[1]|implode), .[3]]
        else
            [$s_read_quoted, _append(.[1]; $c), .[2], .[3]]
        end;
    def _do_final($c):
        .;
    def _do_finalize:
        if .[0] == $s_start then
            [$s_final, null, null, .[3]]
        elif .[0] == $s_next_value then
            [$s_final, null, null, _append(.[3]; [""])]
        elif .[0] == $s_read_value then
            [$s_final, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        elif .[0] == $s_read_quoted then
            [$s_final, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        elif .[0] == $s_escape then
            [$s_final, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        else # .[0] == $s_final
            .
        end;
    reduce explode[] as $c (
        [$s_start,null,null,[]];
        do_if($c != $skip;
            if .[0] == $s_start then
                _do_start($c)
            elif .[0] == $s_next_value then
                _do_next_value($c)
            elif .[0] == $s_read_value then
                _do_read_value($c)
            elif .[0] == $s_read_quoted then
                _do_read_quoted($c)
            elif .[0] == $s_escape then
                _do_escape($c)
            else # .[0] == $s_final
                _do_final($c)
            end
        )
    )
    | _do_finalize[3][];
def parse_delimited($delim; $quot; $nl; $skip):
    _parse_delimited($delim; $quot; $nl; $skip);
def parse_delimited($delim; $quot; $nl):
    parse_delimited($delim; $quot; $nl; "\r");
def parse_delimited($delim; $quot):
    parse_delimited($delim; $quot; "\n");
def parse_delimited($delim):
    parse_delimited($delim; "\"");
def parse_csv:
    parse_delimited(",");

For your data, you would want to change the delimiter to semicolons.

对于您的数据，您可能希望将分隔符更改为分号。

$ cat se.csv
-21.3214077;55.4851413;Ruizia cordata
-21.3213078;55.4849803;Cossinia pinnata
$ jq -R 'parse_delimited(";")' se.csv
[
  "-21.3214077",
  "55.4851413",
  "Ruizia cordata"
]
[
  "-21.3213078",
  "55.4849803",
  "Cossinia pinnata"
]

This will work fine for most inputs to parse a line at a time, but if your data has literal newlines, you will want to read the entire file as a string.

这对于大多数输入一次解析一行都可以正常工作，但如果您的数据有文字换行符，您将希望将整个文件作为字符串读取。

$ cat input.csv
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
$ jq -Rs 'parse_csv' input.csv
[
  "Year",
  "Make",
  "Model",
  "Description",
  "Price"
]
[
  "1997",
  "Ford",
  "E350",
  "ac, abs, moon",
  "3000.00"
]
[
  "1999",
  "Chevy",
  "Venture \"Extended Edition\"",
  "",
  "4900.00"
]
[
  "1999",
  "Chevy",
  "Venture \"Extended Edition, Very Large\"",
  "",
  "5000.00"
]
[
  "1996",
  "Jeep",
  "Grand Cherokee",
  "MUST SELL!\nair, moon roof, loaded",
  "4799.00"
]

Answer 5

回答by Reino

For completeness sake, Xideltogether with some XQuery magic can do this too:

为了完整起见，Xidel和一些 XQuery 魔法也可以做到这一点：

xidel -s input.csv --xquery '
  {
    "occurrences":for $x in tokenize($raw,"\n") let $a:=tokenize($x,";") return {
      "position":[
        $a[1],
        $a[2]
      ],
      "taxo":{
        "espece":$a[3]
      }
    }
  }
'

{
  "occurrences": [
    {
      "position": ["-21.3214077", "55.4851413"],
      "taxo": {
        "espece": "Ruizia cordata"
      }
    },
    {
      "position": ["-21.3213078", "55.4849803"],
      "taxo": {
        "espece": "Cossinia pinnata"
      }
    }
  ]
}

Answer 6

回答by Ondra ?i?ka

Because the jqsolution does not handle CSV escaping, column names at the first line, commented-out lines and other common CSV "features", I have extended the CSV Crunchertool to allow reading CSV and writing it as JSON. It's not exactly "Bash", but neither is jq:)

由于该jq解决方案不处理 CSV 转义、第一行的列名、注释掉的行和其他常见的 CSV“功能”，因此我扩展了CSV Cruncher工具以允许读取 CSV 并将其写入为 JSON。它不完全是“Bash”，但也不是jq:)

It's primarily a CSV-as-SQL processing app, so it's not completely trivial, but here is the trick:

它主要是一个 CSV-as-SQL 处理应用程序，所以它不是完全微不足道的，但这里有一个技巧：

./crunch -in myfile.csv -out output.csv --json -sql 'SELECT * FROM myfile'

It also allows output as JSON object per lineor proper JSON array. See the documentation.

它还允许输出为每行 JSON 对象或适当的 JSON 数组。请参阅文档。

It's in beta quality, so all feedback or pull requests are welcome.

它处于测试版质量，因此欢迎所有反馈或请求请求。

Answer 7

回答by peak

In general, if your jq has the inputsbuilt-in filter (available since jq 1.5), then it is better to use it rather than the -s command-line option.

通常，如果您的 jq 具有inputs内置过滤器（自 jq 1.5 起可用），那么最好使用它而不是 -s 命令行选项。

Here in any case is a solution using inputs. This solution is also variable-free.

无论如何，这里是使用inputs. 该解决方案也是无变量的。

{"occurrences":
  [inputs
   | select(length > 0)
   | . / ";"
   | {"position": [.[0], .[1]], 
      "taxo": {"espece": .[2]}} ]}

SSV, CSV, and all that

SSV、CSV 和所有这些

The above of course assumes that the file has semicolon-separated fields in each line, and that there are none of the complications associated with CSV files.

以上当然假设文件的每一行都有分号分隔的字段，并且没有与 CSV 文件相关的复杂性。

If the input has fields that are strictly delimited by a single character, then jq should have no problems handling it. Otherwise, it might be best to use a tool that can reliably convert to the TSV (tab-separated value) format, which jq can handle directly.

如果输入具有由单个字符严格分隔的字段，则 jq 处理它应该没有问题。否则，最好使用能够可靠地转换为 jq 可以直接处理的 TSV（制表符分隔值）格式的工具。

在 bash 中将 CSV 转换为 JSON

提问by HydrUra

采纳答案by Charles Duffy

回答by Ondra ?i?ka

回答by Raman

回答by Jeff Mercado

回答by Reino

回答by Ondra ?i?ka

回答by peak

SSV, CSV, and all that

SSV、CSV 和所有这些

相关推荐

最近更新

标签

在 bash 中将 CSV 转换为 JSON

提问by HydrUra

采纳答案by Charles Duffy

回答by Ondra ?i?ka

回答by Raman

回答by Jeff Mercado

回答by Reino

回答by Ondra ?i?ka

回答by peak

SSV, CSV, and all that

SSV、CSV 和所有这些

相关推荐

bash `stdout` 和 `stderr` 的读取值

bash 如何在当前终端的会话中设置语言环境？

bash Systemd 脚本失败

仅使用 Bash 解析 JSON

相关推荐

最近更新

标签