将导入的 json 数据导入数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16947643/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting imported json data into a data frame
提问by Andrew Staroscik
I have a file containing over 1500 json objects that I want to work with in R. I've been able to import the data as a list, but am having trouble coercing it into a useful structure. I want to create a data frame containing a row for each json object and a column for each key:value pair.
我有一个包含超过 1500 个 json 对象的文件,我想在 R 中使用这些对象。我已经能够将数据作为列表导入,但无法将其强制转换为有用的结构。我想创建一个数据框,其中包含每个 json 对象的一行和每个键:值对的列。
I've recreated my situation with this small, fake data set:
我用这个小的假数据集重现了我的情况:
[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null},
{"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},
{"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null},
{"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865},
{"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221},
{"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413},
{"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]
Some features of the data:
数据的一些特点:
- The objects all contain the same number of key:value pairs although some of the values are null
- There are two non-numeric columns per object (name and group)
- name is the unique identifier, there are 10 or so groups
- many of the name and group entires contain spaces, commas and other punctuation.
- 这些对象都包含相同数量的键:值对,尽管其中一些值为 null
- 每个对象有两个非数字列(名称和组)
- name 是唯一标识符,大约有 10 个组
- 许多名称和组词都包含空格、逗号和其他标点符号。
Based on this question: R list(structure(list())) to data frame, I tried the following:
基于这个问题:R list(structure(list())) to data frame,我尝试了以下操作:
json_file <- "test.json"
json_data <- fromJSON(json_file)
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))
With both my real data and this fake data, the last line give me this error:
有了我的真实数据和这个假数据,最后一行给了我这个错误:
Error in data.frame(name = "Doe, John", group = "Red", `age (y)` = 24, :
arguments imply differing number of rows: 1, 0
回答by SchaunW
You just need to replace your NULLs with NAs:
您只需要用 NA 替换 NULL:
require(RJSONIO)
json_file <- '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null},
{"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},
{"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null},
{"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865},
{"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221},
{"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413},
{"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]'
json_file <- fromJSON(json_file)
json_file <- lapply(json_file, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)
})
Once you have a non-null value for each element, you can call rbindwithout getting an error:
一旦每个元素都有一个非空值,就可以调用rbind而不会出错:
do.call("rbind", json_file)
name group age (y) height (cm) wieght (kg) score
[1,] "Doe, John" "Red" "24" "182" "74.8" NA
[2,] "Doe, Jane" "Green" "30" "170" "70.1" "500"
[3,] "Smith, Joan" "Yellow" "41" "169" "60" NA
[4,] "Brown, Sam" "Green" "22" "183" "75" "865"
[5,] "Jones, Larry" "Green" "31" "178" "83.9" "221"
[6,] "Murray, Seth" "Red" "35" "172" "76.2" "413"
[7,] "Doe, Jane" "Yellow" "22" "164" "68" "902"
回答by SymbolixAU
This is very simple if you use either library(jsonlite)or library(jsonify)
如果您使用library(jsonlite)或library(jsonify)
Both of these handle the nullvalues and converts them to NA, and they preserve the data types.
这两者都处理null值并将它们转换为NA,并且它们保留数据类型。
Data
数据
json_file <- '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null},
{"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},
{"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null},
{"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865},
{"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221},
{"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413},
{"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]'
jsonlite
jsonlite
library(jsonlite)
jsonlite::fromJSON( json_file )
# name group age (y) height (cm) wieght (kg) score
# 1 Doe, John Red 24 182 74.8 NA
# 2 Doe, Jane Green 30 170 70.1 500
# 3 Smith, Joan Yellow 41 169 60.0 NA
# 4 Brown, Sam Green 22 183 75.0 865
# 5 Jones, Larry Green 31 178 83.9 221
# 6 Murray, Seth Red 35 172 76.2 413
# 7 Doe, Jane Yellow 22 164 68.0 902
str( jsonlite::fromJSON( json_file ) )
# 'data.frame': 7 obs. of 6 variables:
# $ name : chr "Doe, John" "Doe, Jane" "Smith, Joan" "Brown, Sam" ...
# $ group : chr "Red" "Green" "Yellow" "Green" ...
# $ age (y) : int 24 30 41 22 31 35 22
# $ height (cm): int 182 170 169 183 178 172 164
# $ wieght (kg): num 74.8 70.1 60 75 83.9 76.2 68
# $ score : int NA 500 NA 865 221 413 902
jsonify
jsonify
library(jsonify)
jsonify::from_json( json_file )
# name group age (y) height (cm) wieght (kg) score
# 1 Doe, John Red 24 182 74.8 NA
# 2 Doe, Jane Green 30 170 70.1 500
# 3 Smith, Joan Yellow 41 169 60.0 NA
# 4 Brown, Sam Green 22 183 75.0 865
# 5 Jones, Larry Green 31 178 83.9 221
# 6 Murray, Seth Red 35 172 76.2 413
# 7 Doe, Jane Yellow 22 164 68.0 90
str( jsonify::from_json( json_file ) )
# 'data.frame': 7 obs. of 6 variables:
# $ name : chr "Doe, John" "Doe, Jane" "Smith, Joan" "Brown, Sam" ...
# $ group : chr "Red" "Green" "Yellow" "Green" ...
# $ age (y) : int 24 30 41 22 31 35 22
# $ height (cm): int 182 170 169 183 178 172 164
# $ wieght (kg): num 74.8 70.1 60 75 83.9 76.2 68
# $ score : int NA 500 NA 865 221 413 902
回答by lenoch
To remove null values use parameter nullValue
要删除空值,请使用参数 nullValue
json_data <- fromJSON(json_file, nullValue = NA)
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))
this way there won′t be any unnecessary quotes in your output
这样你的输出中就不会有任何不必要的引号
回答by Ahsan Habib
library(rjson)
Lines <- readLines("yelp_academic_dataset_business.json")
business <- as.data.frame(t(sapply(Lines, fromJSON)))
You may try this to load JSON data into R
您可以尝试将 JSON 数据加载到 R
回答by YH Wu
dplyr::bind_rows(fromJSON(file_name))

