如何将 JSON 导入 R 并将其转换为表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20925492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 20:09:06  来源:igfitidea点击:

How to import JSON into R and convert it to table?

jsonrloopsdictionaryrjson

提问by Aidis

I want to play with data that is now saved in JSON format. But I am very new to R and have little clue of how to play with data. You can see below what I managed to achieve. But first, my code:

我想处理现在以 JSON 格式保存的数据。但我对 R 很陌生,对如何处理数据几乎一无所知。您可以在下面看到我设法实现的目标。但首先,我的代码:

library(rjson)
json_file <- "C:\Users\Saonkfas\Desktop\WOWPAPI\wowpfinaljson.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

I was able to the data:

我能够得到数据:

for (x in json_data){print (x)}

Although output looks pretty raw:

虽然输出看起来很原始:

[[1]]
[[1]]$wins
[1] "118"

[[1]]$losses
[1] "40"
# And so on

Note that the JSON is somewhat nested. I could create tables with Python, but R seems much more complicated.

请注意,JSON 有点嵌套。我可以用 Python 创建表,但 R 似乎要复杂得多。

Edit:

编辑:

My JSON:

我的JSON:

{
"play1": [
    {
        "wins": "118",
        "losses": "40",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "4401",
        "max_ground_object_destroyed": "3"
    },
    {
        "wins": "100",
        "losses": "58",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "2401",
        "max_ground_object_destroyed": "3"
    },
    {
        "wins": "120",
        "losses": "38",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "2403",
        "max_ground_object_destroyed": "3"
    }
],

"play2": [
    {
        "wins": "12",
        "losses": "450",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "4401",
        "max_ground_object_destroyed": "3"
    },
    {
        "wins": "150",
        "losses": "8",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "2401",
        "max_ground_object_destroyed": "3"
    },
    {
        "wins": "120",
        "losses": "328",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "2403",
        "max_ground_object_destroyed": "3"
    }
],

回答by nico

fromJSONreturns a list, you can use the *applyfunctions to go through each element. It's fairly straightforward (once you know what to do!) to convert it to a "table" (data frame is the correct R terminology).

fromJSON返回一个列表,您可以使用这些*apply函数遍历每个元素。将其转换为“表”(数据框是正确的 R 术语)非常简单(一旦您知道该怎么做!)。

library(rjson)

# You can pass directly the filename
my.JSON <- fromJSON(file="test.json")

df <- lapply(my.JSON, function(play) # Loop through each "play"
  {
  # Convert each group to a data frame.
  # This assumes you have 6 elements each time
  data.frame(matrix(unlist(play), ncol=6, byrow=T))
  })

# Now you have a list of data frames, connect them together in
# one single dataframe
df <- do.call(rbind, df)

# Make column names nicer, remove row names
colnames(df) <- names(my.JSON[[1]][[1]])
rownames(df) <- NULL

df
  wins losses max_killed battles plane_id max_ground_object_destroyed
1  118     40          7     158     4401                           3
2  100     58          7     158     2401                           3
3  120     38          7     158     2403                           3
4   12    450          7     158     4401                           3
5  150      8          7     158     2401                           3
6  120    328          7     158     2403                           3

回答by Eric

I find jsonliteto be a little more user friendly for this task. Here is a comparison of three JSON parsing packages(biased in favor of jsonlite)

我发现jsonlite这项任务对用户更加友好。下面是三个 JSON 解析包的比较(偏向于jsonlite

library(jsonlite)
data <- fromJSON('path/to/file.json')

data
#> $play1
#   wins losses max_killed battles plane_id max_ground_object_destroyed
# 1  118     40          7     158     4401                           3
# 2  100     58          7     158     2401                           3
# 3  120     38          7     158     2403                           3
# 
# $play2
#   wins losses max_killed battles plane_id max_ground_object_destroyed
# 1   12    450          7     158     4401                           3
# 2  150      8          7     158     2401                           3
# 3  120    328          7     158     2403                           3

If you want to collapse those list names into a new column, I recommend dplyr::bind_rowsrather than do.call(rbind, data)

如果您想将这些列表名称折叠到一个新列中,我建议dplyr::bind_rows而不是do.call(rbind, data)

library(dplyr)
data <- bind_rows(data, .id = 'play')

# Source: local data frame [6 x 7]

#    play  wins losses max_killed battles plane_id max_ground_object_destroyed
#   (chr) (chr)  (chr)      (chr)   (chr)    (chr)                       (chr)
# 1 play1   118     40          7     158     4401                           3
# 2 play1   100     58          7     158     2401                           3
# 3 play1   120     38          7     158     2403                           3
# 4 play2    12    450          7     158     4401                           3
# 5 play2   150      8          7     158     2401                           3
# 6 play2   120    328          7     158     2403                           3

Beware that the columns may not have the type you expect (notice the columns are all characters since all of the numbers were quoted in the provided JSON data)!

请注意,列可能不是您期望的类型(请注意,列都是字符,因为所有数字都在提供的 JSON 数据中引用)!

Edit Nov. 2017:One approach to type conversion would be to use mutate_ifto guess the intended type of character columns.

2017 年 11 月编辑:类型转换的一种方法是使用mutate_if猜测字符列的预期类型。

data <- mutate_if(data, is.character, type.convert, as.is = TRUE)

回答by pauljeba

I prefer tidyjsonover rjson and jsonlite as it has a easy workflow for converting multilevel nested json objects to 2 dimensional tables. Your problem can be easily solved using this package from github.

相比rjson 和 jsonlite,我更喜欢tidyjson,因为它具有将多级嵌套 json 对象转换为二维表的简单工作流程。使用 github 中的这个包可以轻松解决您的问题。

devtools::install_github("sailthru/tidyjson")

library(tidyjson)
library(dplyr)

> json %>%  as.tbl_json %>% gather_keys %>% gather_array %>%  
+   spread_values(
+     wins = jstring("wins"),
+     losses = jstring("losses"),
+     max_killed = jstring("max_killed"),
+     battles = jstring("battles"),
+     plane_id = jstring("plane_id"),
+     max_ground_object_destroyed = jstring("max_ground_object_destroyed")
+    )

Output

输出

  document.id   key array.index wins losses max_killed battles plane_id max_ground_object_destroyed
1           1 play1           1  118     40          7     158     4401                           3
2           1 play1           2  100     58          7     158     2401                           3
3           1 play1           3  120     38          7     158     2403                           3
4           1 play2           1   12    450          7     158     4401                           3
5           1 play2           2  150      8          7     158     2401                           3
6           1 play2           3  120    328          7     158     2403                           3