list 将列表转换为数据框的最有效方法是什么？

Question

提问by DrewConway

Very often I want to convert a list wherein each index has identical element types to a data frame. For example, I may have a list:

我经常想将其中每个索引具有相同元素类型的列表转换为数据框。例如，我可能有一个列表：

> my.list
[[1]]
[[1]]$global_stdev_ppb
[1] 24267673

[[1]]$range
[1] 0.03114799

[[1]]$tok
[1] "hello"

[[1]]$global_freq_ppb
[1] 211592.6


[[2]]
[[2]]$global_stdev_ppb
[1] 11561448

[[2]]$range
[1] 0.08870838

[[2]]$tok
[1] "world"

[[2]]$global_freq_ppb
[1] 1002043

I want to convert this list to a data frame where each index element is a column. The natural (to me) thing to go is to is use do.call:

我想将此列表转换为数据框，其中每个索引元素都是一列。自然（对我而言）的事情是使用do.call：

> my.matrix<-do.call("rbind", my.list)
> my.matrix
     global_stdev_ppb range      tok     global_freq_ppb
[1,] 24267673         0.03114799 "hello" 211592.6       
[2,] 11561448         0.08870838 "world" 1002043

Straightforward enough, but when I attempt to cast this matrix as a data frame, the columns remain list elements, rather than vectors:

很简单，但是当我尝试将此矩阵转换为数据框时，列仍然是列表元素，而不是向量：

> my.df<-as.data.frame(my.matrix, stringsAsFactors=FALSE)
> my.df[,1]
[[1]]
[1] 24267673

[[2]]
[1] 11561448

Currently, to get the data frame cast properly I am iterating over each column using unlistand as.vector, then recasting the data frame as such:

目前，为了正确转换数据框，我使用unlistand遍历每一列as.vector，然后重新转换数据框：

new.list<-lapply(1:ncol(my.matrix), function(x) as.vector(unlist(my.matrix[,x])))
my.df<-as.data.frame(do.call(cbind, new.list), stringsAsFactors=FALSE)

This, however, seem very inefficient. Is there are better way to do this?

然而，这似乎非常低效。有没有更好的方法来做到这一点？

Answer 1

回答by Joshua Ulrich

I think you want:

我想你想要：

> do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE))
  global_stdev_ppb      range   tok global_freq_ppb
1         24267673 0.03114799 hello        211592.6
2         11561448 0.08870838 world       1002043.0
> str(do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE)))
'data.frame':   2 obs. of  4 variables:
 $ global_stdev_ppb: num  24267673 11561448
 $ range           : num  0.0311 0.0887
 $ tok             : chr  "hello" "world"
 $ global_freq_ppb : num  211593 1002043

Answer 2

回答by Gavin Simpson

Another option is:

另一种选择是：

data.frame(t(sapply(mylist, `[`)))

but this simple manipulation results in a data frame of lists:

但是这个简单的操作会产生一个列表数据框：

> str(data.frame(t(sapply(mylist, `[`))))
'data.frame':   2 obs. of  3 variables:
 $ a:List of 2
  ..$ : num 1
  ..$ : num 2
 $ b:List of 2
  ..$ : num 2
  ..$ : num 3
 $ c:List of 2
  ..$ : chr "a"
  ..$ : chr "b"

An alternative to this, along the same lines but now the result same as the other solutions, is:

与此相同的替代方案，但现在结果与其他解决方案相同，是：

data.frame(lapply(data.frame(t(sapply(mylist, `[`))), unlist))

[Edit:included timings of @Martin Morgan's two solutions, which have the edge over the other solution that return a data frame of vectors.] Some representative timings on a very simple problem:

[编辑：包括@Martin Morgan 的两个解决方案的时间，它们比返回向量数据帧的另一个解决方案更具优势。] 关于一个非常简单的问题的一些代表性时间：

mylist <- list(list(a = 1, b = 2, c = "a"), list(a = 2, b = 3, c = "b"))

> ## @Joshua Ulrich's solution:
> system.time(replicate(1000, do.call(rbind, lapply(mylist, data.frame,
+                                     stringsAsFactors=FALSE))))
   user  system elapsed 
  1.740   0.001   1.750

> ## @JD Long's solution:
> system.time(replicate(1000, do.call(rbind, lapply(mylist, data.frame))))
   user  system elapsed 
  2.308   0.002   2.339

> ## my sapply solution No.1:
> system.time(replicate(1000, data.frame(t(sapply(mylist, `[`)))))
   user  system elapsed 
  0.296   0.000   0.301

> ## my sapply solution No.2:
> system.time(replicate(1000, data.frame(lapply(data.frame(t(sapply(mylist, `[`))), 
+                                               unlist))))
   user  system elapsed 
  1.067   0.001   1.091

> ## @Martin Morgan's Map() sapply() solution:
> f = function(x) function(i) sapply(x, `[[`, i)
> system.time(replicate(1000, as.data.frame(Map(f(mylist), names(mylist[[1]])))))
   user  system elapsed 
  0.775   0.000   0.778

> ## @Martin Morgan's Map() lapply() unlist() solution:
> f = function(x) function(i) unlist(lapply(x, `[[`, i), use.names=FALSE)
> system.time(replicate(1000, as.data.frame(Map(f(mylist), names(mylist[[1]])))))
   user  system elapsed 
  0.653   0.000   0.658

Answer 3

回答by JD Long

I can't tell you this is the "most efficient" in terms of memory or speed, but it's pretty efficient in terms of coding:

我不能告诉你这在内存或速度方面是“最有效的”，但在编码方面非常有效：

my.df <- do.call("rbind", lapply(my.list, data.frame))

the lapply() step with data.frame() turns each list item into a single row data frame which then acts nice with rbind()

带有 data.frame() 的 lapply() 步骤将每个列表项转换为单行数据框，然后与 rbind() 配合使用

Answer 4

回答by Kevin Ushey

Although this question has long since been answered, it's worth pointing out the data.tablepackage has rbindlistwhich accomplishes this task veryquickly:

尽管这个问题早就得到了回答，但值得指出的是可以非常快速地完成此任务的data.table软件包：rbindlist

library(microbenchmark)
library(data.table)
l <- replicate(1E4, list(a=runif(1), b=runif(1), c=runif(1)), simplify=FALSE)

microbenchmark( times=5,
  R=as.data.frame(Map(f(l), names(l[[1]]))),
  dt=data.frame(rbindlist(l))
)

gives me

给我

Unit: milliseconds
 expr       min        lq    median        uq       max neval
    R 31.060119 31.403943 32.278537 32.370004 33.932700     5
   dt  2.271059  2.273157  2.600976  2.635001  2.729421     5

Answer 5

回答by Martin Morgan

This

这个

f = function(x) function(i) sapply(x, `[[`, i)

is a function that returns a function that extracts the i'th element of x. So

是一个函数，它返回一个提取 x 的第 i 个元素的函数。所以

Map(f(mylist), names(mylist[[1]]))

gets a named (thanks Map!) list of vectors that can be made into a data frame

获取一个命名的（感谢 Map！）可以制成数据框的向量列表

as.data.frame(Map(f(mylist), names(mylist[[1]])))

For speed it's usually faster to use unlist(lapply(...), use.names=FALSE)as

对于它的速度更快，通常对使用unlist(lapply(...), use.names=FALSE)作为

f = function(x) function(i) unlist(lapply(x, `[[`, i), use.names=FALSE)

A more general variant is

更一般的变体是

f = function(X, FUN) function(...) sapply(X, FUN, ...)

When do the list-of-lists structures come up? Maybe there's an earlier step where an iteration could be replaced by something more vectorized?

列表列表结构什么时候出现？也许有一个更早的步骤可以用更矢量化的东西代替迭代？

Answer 6

回答by Yi Li

The dplyr package's bind_rowsis efficient.

dplyr 包bind_rows是高效的。

one <- mtcars[1:4, ]
two <- mtcars[11:14, ]
system.time(dplyr::bind_rows(one, two))
   user  system elapsed 
  0.001   0.000   0.001

Answer 7

回答by sbha

Not sure where they rank as far as efficiency, but depending on the structure of your lists there are some tidyverseoptions. A bonus is that they work nicely with unequal length lists:

不确定它们在效率方面的排名，但根据您的列表结构，有一些tidyverse选项。一个好处是它们可以很好地处理不等长的列表：

l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
        , b = list(var.1 = 4, var.2 = 5)
        , c = list(var.1 = 7, var.3 = 9)
        , d = list(var.1 = 10, var.2 = 11, var.3 = NA))

df <- dplyr::bind_rows(l)
df <- purrr::map_df(l, dplyr::bind_rows)
df <- purrr::map_df(l, ~.x)

# all create the same data frame:
# A tibble: 4 x 3
  var.1 var.2 var.3
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5    NA
3     7    NA     9
4    10    11    NA

And you can also mix vectors and data frames:

您还可以混合向量和数据框：

library(dplyr)
bind_rows(
  list(a = 1, b = 2),
  data_frame(a = 3:4, b = 5:6),
  c(a = 7)
)

# A tibble: 4 x 2
      a     b
  <dbl> <dbl>
1     1     2
2     3     5
3     4     6
4     7    NA

list 将列表转换为数据框的最有效方法是什么？

提问by DrewConway

回答by Joshua Ulrich

回答by Gavin Simpson

回答by JD Long

回答by Kevin Ushey

回答by Martin Morgan

回答by Yi Li

回答by sbha

相关推荐

最近更新

标签

list 将列表转换为数据框的最有效方法是什么？

提问by DrewConway

回答by Joshua Ulrich

回答by Gavin Simpson

回答by JD Long

回答by Kevin Ushey

回答by Martin Morgan

回答by Yi Li

回答by sbha

相关推荐

list 如何在R中创建列表向量？

list 如何默认为数据表视图

list 生成列表所有可能排列的算法？

list scala中数组和列表的区别

相关推荐

最近更新

标签