list 将向量列表转换为数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43662457/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 08:13:23  来源:igfitidea点击:

Convert list of vectors to data frame

rlistvectordataframe

提问by Nick

I'm trying to convert a list of vectors (a multidimensional array essentially) into a data frame, but every time I try I'm getting unexpected results.

我正在尝试将向量列表(本质上是一个多维数组)转换为数据框,但每次尝试都会得到意想不到的结果。

My aim is to instantiate a blank list, populate it in a for loop with vectors containing information about that iteration of the loop, then convert it into a data frame after it's finished.

我的目标是实例化一个空白列表,在 for 循环中填充它,其中包含有关循环迭代的信息的向量,然后在完成后将其转换为数据帧。

> vectorList <- list()
> for(i in  1:5){
+     vectorList[[i]] <- c("number" = i, "square root" = sqrt(i))
+ }
> vectorList

Outputs:

输出:

> [[1]]
>      number square root 
>           1           1 
> 
> [[2]]
>      number square root 
>    2.000000    1.414214 
> 
> [[3]]
>      number square root 
>    3.000000    1.732051 
> 
> [[4]]
>      number square root 
>           4           2 
> 
> [[5]]
>      number square root 
>    5.000000    2.236068

Now I want this to become a data frame with 5 observations of 2 variables, but trying to create a data frame from 'vectorList'

现在我希望它成为一个包含 2 个变量的 5 个观察值的数据框,但试图从“vectorList”创建一个数据框

numbers <- data.frame(vectorList)

results in 2 observations of 5 variables.

导致 5 个变量的 2 个观察结果。

Weirdly it won't even be coerced with reshape2 (which I know would be an awful work around, but I tried).

奇怪的是,它甚至不会被 reshape2 强制(我知道这将是一个糟糕的工作,但我试过了)。

Anyone got any insight?

任何人有任何见解?

回答by h3rm4n

You can use:

您可以使用:

as.data.frame(do.call(rbind, vectorList))

Or:

或者:

library(data.table)
rbindlist(lapply(vectorList, as.data.frame.list))

Or:

或者:

library(dplyr)
bind_rows(lapply(vectorList, as.data.frame.list))

回答by Giuseppe

The fastest and most efficient way that I know is using the data.table::transposefunction (if the length of your vector is low-dimensional):

我所知道的最快和最有效的方法是使用该data.table::transpose函数(如果向量的长度是低维的):

as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]]))

as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]]))

However, you will need to set the column names manually as data.table::transposeremoves them. There is also a purrr::transposefunction that does not remove the column names but it seems to be slower. Below a small benchmark including the suggestions of the other users:

但是,您需要在data.table::transpose删除它们时手动设置列名称。还有一个purrr::transpose函数不会删除列名,但它似乎更慢。下面是一个小的基准测试,包括其他用户的建议:

vectorList = lapply(1:1000, function(i) (c("number" = i, "square root" = sqrt(i))))
bench = microbenchmark::microbenchmark(
  dplyr = dplyr::bind_rows(lapply(vectorList, as.data.frame.list)),
  rbindlist = data.table::rbindlist(lapply(vectorList, as.data.frame.list)),
  Reduce = Reduce(rbind, vectorList),
  transpose_datatable = as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]])),
  transpose_purrr = data.table::as.data.table(purrr::transpose(vectorList)),
  do.call = as.data.frame(do.call(rbind, vectorList)),
  times = 10)
bench
# Unit: microseconds
#                 expr        min         lq        mean      median         uq        max neval cld
#                dplyr 286963.036 292850.136 320345.1137 310159.7380 341654.619 385399.851    10   b
#            rbindlist 285830.750 289935.336 306120.7257 309581.1895 318131.031 324217.413    10   b
#               Reduce   8573.474   9073.649  12114.5559   9632.1120  11153.511  33446.353    10  a 
#  transpose_datatable    372.572    424.165    500.8845    479.4990    532.076    701.822    10  a 
#      transpose_purrr    539.953    590.365    672.9531    671.1025    718.757    911.343    10  a 
#              do.call    452.915    537.591    562.9144    570.0825    592.334    641.958    10  a 

# now use bigger list and disregard the slowest
vectorList = lapply(1:100000, function(i) (c("number" = i, "square root" = sqrt(i))))
bench.big = microbenchmark::microbenchmark(
  transpose_datatable = as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]])),
  transpose_purrr = data.table::as.data.table(purrr::transpose(vectorList)),
  do.call = as.data.frame(do.call(rbind, vectorList)),
  times = 10)
bench.big
# Unit: milliseconds
#                 expr       min        lq       mean     median         uq       max neval cld
#  transpose_datatable  3.470901   4.59531   4.551515   4.708932   4.873755   4.91235    10 a  
#      transpose_purrr 61.007574  62.06936  68.634732  65.949067  67.477948  97.39748    10  b 
#              do.call 97.680252 102.04674 115.669540 104.983596 138.193644 151.30886    10   c

回答by 989

Also Reduce:

还有Reduce

Reduce(rbind, vectorList)

    # number square root
# init      1    1.000000
          # 2    1.414214
          # 3    1.732051
          # 4    2.000000
          # 5    2.236068

回答by Artem Sokolov

An alternative solution using purrr:

使用的替代解决方案purrr

purrr::map_dfr( vectorList, as.list )
# # A tibble: 5 x 2
#   number `square root`
#    <dbl>         <dbl>
# 1      1          1   
# 2      2          1.41
# 3      3          1.73
# 4      4          2   
# 5      5          2.24

The code effectively converts each vector to a list and concatenates the results row-wise into a common data frame.

该代码有效地将每个向量转换为一个列表,并将结果逐行连接成一个公共数据框。