使用 BASH 将 CSV 转换为 JSON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24300508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
CSV to JSON using BASH
提问by dexnow
I am trying to covert the below csv into json format.
我正在尝试将以下 csv 转换为 json 格式。
Africa,Kenya,NAI,281
Africa,Kenya,NAI,281
Asia,India,NSI,100
Asia,India,BSE,160
Asia,Pakistan,ISE,100
Asia,Pakistan,ANO,100
European Union,United Kingdom,LSE,100
This is the desired json format and I just cannot get to create it. I will post my work in progress below this.. Any help or direction would be appreciated...
这是所需的 json 格式,我无法创建它。我将在下面发布我正在进行的工作.. 任何帮助或指导将不胜感激......
{"name":"Africa",
"children":[
{"name":"Kenya",
"children":[
{"name":"NAI","size":"109"},
{"name":"NAA","size":"160"}]}]},
{"name":"Asia",
"children":[
{"name":"India",
"children":[
{"name":"NSI","size":"100"},
{"name":"BSE","size":"60"}]},
{"name":"Pakistan",
"children":[
{"name":"ISE","size":"120"},
{"name":"ANO","size":"433"}]}]},
{"name":"European Union",
"children":[
{"name":"United Kingdom",
"children":[
{"name":"LSE","size":"550"},
{"name":"PLU","size":"123"}]}]}
Work in Progress.
工作正在进行中。
$1 is the file with the csv values pasted above.
$1 是上面粘贴了 csv 值的文件。
#!/bin/bash
pcountry=$(head -1 | cut -d, -f2)
cat | while read line ; do
region=$(echo $line|cut -d, -f1)
country=$(echo $line|cut -d, -f2)
code=$(echo $line|cut -d, -f3-)
size=$(echo $line|cut -d, -f4)
if test "$pcountry" == "$country" ;
then
echo -e {\"name\":\"$region\", '\n' \"children\": [ '\n'{\"name\":\"$country\",'\n'\"children\": [ '\n' \{\"name\":\"NAI\",\"size\":\"$size\"\}
else
if test "$pregion" == "$region"
then :
else
echo -e ,'\n'{\"name\":\""$region\", '\n' \"children\": [ '\n'{\"name\":\"$country\",'\n'\"children\": [ '\n' \{\"name\":\"NAI\",\"size\":\"$size\"\},
pcountry=$country
pregion=$region
fi ; done
Problem is that I cannot seem to find a way to find out when a countries value ends.
问题是我似乎无法找到一种方法来确定国家价值何时结束。
回答by David Atchley
As a number of the commenters have said, using the shell for this kind of conversion is a horrible idea. And, it would be nigh impossible to do it with just bash
builtins; and shell scripts are used to combine standard unix commands like sed
, awk
, cut
, etc. anyway. You should choose a better language that's built for that kind of iterative parsing/processing to solve your problem.
正如许多评论者所说,使用 shell 进行这种转换是一个可怕的想法。而且,仅使用bash
内置函数几乎不可能做到;和外壳脚本用于标准UNIX命令等结合sed
,awk
,cut
等反正。您应该选择一种为这种迭代解析/处理而构建的更好的语言来解决您的问题。
However, because it's late and I've had too much coffee, I threw together a bash
script (with a few bits of sed
thrown in for parsing help) that takes the example .csv
data you have and outputs the JSON in the format you noted. Here's the script:
但是,因为时间太晚了,而且我喝了太多咖啡,所以我bash
编写了一个脚本(包含一些sed
用于解析帮助的内容),该脚本采用.csv
您拥有的示例数据并以您记下的格式输出 JSON。这是脚本:
#! /bin/bash
# Initial input file format:
#
# Africa,Kenya,NAI,281
# Africa,Kenya,NAA,281
# Asia,India,NSI,100
# Asia,India,BSE,160
# Asia,Pakistan,ISE,100
# Asia,Pakistan,ANO,100
# European Union,United Kingdom,LSE,100
#
# Intermediate file format for parsing to JSON:
#
# Africa|Kenya:NAI=281
# Asia|India:BSE=160&NSI=100|Pakistan:ISE=100&ANO=100
# European Union|United Kingdom:LSE=100
#
# Call as:
#
# $ ./script INPUTFILE.csv >OUTPUTFILE.json
#
# temporary files for output/parsing
TMP="./tmp.dat"
TMP2="./tmp2.dat"
>$TMP
>$TMP2
# read through initial file and output intermediate format
while read line
do
region=$(echo $line | cut -d, -f1)
country=$(echo $line | cut -d, -f2)
code=$(echo $line | cut -d, -f3)
size=$(echo $line | cut -d, -f4)
# region record already started
if grep "^$region" $TMP 2>&1 >/dev/null ;then
>$TMP2
while read rec
do
if echo $rec | grep "^$region" 2>&1 >/dev/null
then
if echo "$rec" | grep "\|$country:" 2>&1 >/dev/null
then
echo "$rec" | sed -e 's/\('"$country"':[^\|][^\|]*\)/\&'"$code"'='"$size"'/' >>$TMP2
else
echo "$rec|$country:$code=$size" >>$TMP2
fi
else
echo $rec >>$TMP2
fi
done < $TMP
mv $TMP2 $TMP
else
# new region
echo "$region|$country:$code=$size" >>$TMP
fi
done <
# Parse through our intermediary format and output JSON to standard out
echo "["
country_count=$(cat $TMP | wc -l)
while read line
do
country=$(echo $line | cut -d\| -f1)
echo "{ \"name\": \"$country\", "
echo " \"children\": ["
region_count=$(echo $line | cut -d\| -f2- | sed -e 's/|/\n/g' | wc -l)
echo $line | cut -d\| -f2- | sed -e 's/|/\n/g' |
while read region
do
name=$(echo $region | cut -d: -f1)
echo " { \"name\": \"$name\", "
echo " \"children\": ["
code_count=$(echo $region | sed -e 's/^'"$name"'://' -e 's/&/\n/g' | wc -l)
echo $region | sed -e 's/^'"$name"'://' -e 's/&/\n/g' |
while read code_size
do
code=$(echo $code_size | cut -d= -f1)
size=$(echo $code_size | cut -d= -f2)
code_count=$((code_count - 1))
COMMA=""
if [ $code_count -gt 0 ]; then
COMMA=","
fi
echo " { \"name\": \"$code\", \"size\": \"$size\" }$COMMA "
done
echo " ]"
region_count=$((region_count - 1))
if [ $region_count -gt 0 ]; then
echo " },"
else
echo " }"
fi
done
echo " ]"
country_count=$((country_count - 1))
COMMA=""
if [ $country_count -gt 0 ]; then
COMMA=","
fi
echo "}$COMMA"
done < $TMP
echo "]"
exit 0
And, here's the resulting output from the above script:
而且,这是上述脚本的结果输出:
[
{ "name": "Africa",
"children": [
{ "name": "Kenya",
"children": [
{ "name": "NAI", "size": "281" },
{ "name": "NAA", "size": "281" }
]
}
]
},
{ "name": "Asia",
"children": [
{ "name": "India",
"children": [
{ "name": "NSI", "size": "100" },
{ "name": "BSE", "size": "160" }
]
},
{ "name": "Pakistan",
"children": [
{ "name": "ISE", "size": "100" },
{ "name": "ANO", "size": "100" }
]
}
]
},
{ "name": "European Union",
"children": [
{ "name": "United Kingdom",
"children": [
{ "name": "LSE", "size": "100" }
]
}
]
}
]
Please don't use code like the above in any production environment.
请不要在任何生产环境中使用上述代码。
回答by jq170727
Here is a solution using jq.
这是使用jq的解决方案。
If filter.jq
contains the following filter
如果filter.jq
包含以下过滤器
reduce (
split("\n")[] # split string into lines
| split(",") # split data
| select(length>0) # eliminate blanks
) as [$c1,$c2,$c3,$c4] ( # convert to object
{} # e.g. "Africa": { "Kenya": {
; setpath([$c1,$c2,"name"];$c3) # "name": "NAI",
| setpath([$c1,$c2,"size"];$c4) # "size": "281"
) # }, }
| [ # then build final array of objects format:
keys[] as $k1 # [ {
| {name: $k1, children: ( # "name": "Africa",
.[$k1] # "children": {
| keys[] as $k2 # "name": "Kenya",
| {name: $k2, children:.[$k2]} # "children": { "name": "NAI", "size": "281" }
)} # ...
]
and data
contains the sample data then the command
并data
包含示例数据,然后是命令
$ jq -M -Rsr -f filter.jq data
produces
产生
[
{
"name": "Africa",
"children": {
"name": "Kenya",
"children": {
"name": "NAI",
"size": "281"
}
}
},
{
"name": "Asia",
"children": {
"name": "India",
"children": {
"name": "BSE",
"size": "160"
}
}
},
{
"name": "Asia",
"children": {
"name": "Pakistan",
"children": {
"name": "ANO",
"size": "100"
}
}
},
{
"name": "European Union",
"children": {
"name": "United Kingdom",
"children": {
"name": "LSE",
"size": "100"
}
}
}
]