list 将向量列表转换为数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43662457/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert list of vectors to data frame
提问by Nick
I'm trying to convert a list of vectors (a multidimensional array essentially) into a data frame, but every time I try I'm getting unexpected results.
我正在尝试将向量列表(本质上是一个多维数组)转换为数据框,但每次尝试都会得到意想不到的结果。
My aim is to instantiate a blank list, populate it in a for loop with vectors containing information about that iteration of the loop, then convert it into a data frame after it's finished.
我的目标是实例化一个空白列表,在 for 循环中填充它,其中包含有关循环迭代的信息的向量,然后在完成后将其转换为数据帧。
> vectorList <- list()
> for(i in 1:5){
+ vectorList[[i]] <- c("number" = i, "square root" = sqrt(i))
+ }
> vectorList
Outputs:
输出:
> [[1]]
> number square root
> 1 1
>
> [[2]]
> number square root
> 2.000000 1.414214
>
> [[3]]
> number square root
> 3.000000 1.732051
>
> [[4]]
> number square root
> 4 2
>
> [[5]]
> number square root
> 5.000000 2.236068
Now I want this to become a data frame with 5 observations of 2 variables, but trying to create a data frame from 'vectorList'
现在我希望它成为一个包含 2 个变量的 5 个观察值的数据框,但试图从“vectorList”创建一个数据框
numbers <- data.frame(vectorList)
results in 2 observations of 5 variables.
导致 5 个变量的 2 个观察结果。
Weirdly it won't even be coerced with reshape2 (which I know would be an awful work around, but I tried).
奇怪的是,它甚至不会被 reshape2 强制(我知道这将是一个糟糕的工作,但我试过了)。
Anyone got any insight?
任何人有任何见解?
回答by h3rm4n
You can use:
您可以使用:
as.data.frame(do.call(rbind, vectorList))
Or:
或者:
library(data.table)
rbindlist(lapply(vectorList, as.data.frame.list))
Or:
或者:
library(dplyr)
bind_rows(lapply(vectorList, as.data.frame.list))
回答by Giuseppe
The fastest and most efficient way that I know is using the data.table::transpose
function (if the length of your vector is low-dimensional):
我所知道的最快和最有效的方法是使用该data.table::transpose
函数(如果向量的长度是低维的):
as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]]))
as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]]))
However, you will need to set the column names manually as data.table::transpose
removes them. There is also a purrr::transpose
function that does not remove the column names but it seems to be slower.
Below a small benchmark including the suggestions of the other users:
但是,您需要在data.table::transpose
删除它们时手动设置列名称。还有一个purrr::transpose
函数不会删除列名,但它似乎更慢。下面是一个小的基准测试,包括其他用户的建议:
vectorList = lapply(1:1000, function(i) (c("number" = i, "square root" = sqrt(i))))
bench = microbenchmark::microbenchmark(
dplyr = dplyr::bind_rows(lapply(vectorList, as.data.frame.list)),
rbindlist = data.table::rbindlist(lapply(vectorList, as.data.frame.list)),
Reduce = Reduce(rbind, vectorList),
transpose_datatable = as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]])),
transpose_purrr = data.table::as.data.table(purrr::transpose(vectorList)),
do.call = as.data.frame(do.call(rbind, vectorList)),
times = 10)
bench
# Unit: microseconds
# expr min lq mean median uq max neval cld
# dplyr 286963.036 292850.136 320345.1137 310159.7380 341654.619 385399.851 10 b
# rbindlist 285830.750 289935.336 306120.7257 309581.1895 318131.031 324217.413 10 b
# Reduce 8573.474 9073.649 12114.5559 9632.1120 11153.511 33446.353 10 a
# transpose_datatable 372.572 424.165 500.8845 479.4990 532.076 701.822 10 a
# transpose_purrr 539.953 590.365 672.9531 671.1025 718.757 911.343 10 a
# do.call 452.915 537.591 562.9144 570.0825 592.334 641.958 10 a
# now use bigger list and disregard the slowest
vectorList = lapply(1:100000, function(i) (c("number" = i, "square root" = sqrt(i))))
bench.big = microbenchmark::microbenchmark(
transpose_datatable = as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]])),
transpose_purrr = data.table::as.data.table(purrr::transpose(vectorList)),
do.call = as.data.frame(do.call(rbind, vectorList)),
times = 10)
bench.big
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# transpose_datatable 3.470901 4.59531 4.551515 4.708932 4.873755 4.91235 10 a
# transpose_purrr 61.007574 62.06936 68.634732 65.949067 67.477948 97.39748 10 b
# do.call 97.680252 102.04674 115.669540 104.983596 138.193644 151.30886 10 c
回答by 989
Also Reduce
:
还有Reduce
:
Reduce(rbind, vectorList)
# number square root
# init 1 1.000000
# 2 1.414214
# 3 1.732051
# 4 2.000000
# 5 2.236068
回答by Artem Sokolov
An alternative solution using purrr
:
使用的替代解决方案purrr
:
purrr::map_dfr( vectorList, as.list )
# # A tibble: 5 x 2
# number `square root`
# <dbl> <dbl>
# 1 1 1
# 2 2 1.41
# 3 3 1.73
# 4 4 2
# 5 5 2.24
The code effectively converts each vector to a list and concatenates the results row-wise into a common data frame.
该代码有效地将每个向量转换为一个列表,并将结果逐行连接成一个公共数据框。