list 将一个对象附加到 R 中的列表中,以摊销的常数时间 O(1)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2436688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 01:28:12  来源:igfitidea点击:

Append an object to a list in R in amortized constant time, O(1)?

rperformancelistappendbig-o

提问by Nick

If I have some R list mylist, you can append an item objto it like so:

如果我有一些 R list mylist,你可以obj像这样附加一个项目:

mylist[[length(mylist)+1]] <- obj

But surely there is some more compact way. When I was new at R, I tried writing lappend()like so:

但肯定有一些更紧凑的方式。当我刚接触 R 时,我试着这样写lappend()

lappend <- function(lst, obj) {
    lst[[length(lst)+1]] <- obj
    return(lst)
}

but of course that doesn't work due to R's call-by-name semantics (lstis effectively copied upon call, so changes to lstare not visible outside the scope of lappend(). I know you can do environment hacking in an R function to reach outside the scope of your function and mutate the calling environment, but that seems like a large hammer to write a simple append function.

但当然这不起作用,因为 R 的按名称调用语义(lst在调用时有效地复制,因此更改lst在 范围之外不可见lappend()。我知道您可以在 R 函数中进行环境黑客攻击以到达外部函数的范围并改变调用环境,但这似乎是编写一个简单的 append 函数的大锤。

Can anyone suggest a more beautiful way of doing this? Bonus points if it works for both vectors and lists.

谁能建议一种更漂亮的方式来做到这一点?如果它适用于向量和列表,则加分。

回答by Dirk Eddelbuettel

If it's a list of string, just use the c()function :

如果它是一个字符串列表,只需使用c()函数:

R> LL <- list(a="tom", b="dick")
R> c(LL, c="harry")
$a
[1] "tom"

$b
[1] "dick"

$c
[1] "harry"

R> class(LL)
[1] "list"
R> 

That works on vectors too, so do I get the bonus points?

这也适用于向量,所以我可以获得奖励积分吗?

Edit (2015-Feb-01):This post is coming up on its fifth birthday. Some kind readers keep repeating any shortcomings with it, so by all means also see some of the comments below. One suggestion for listtypes:

编辑(2015 年 2 月 1 日):这篇文章即将迎来它的五岁生日。一些好心的读者不断重复它的任何缺点,所以一定要看看下面的一些评论。list类型的一项建议:

newlist <- list(oldlist, list(someobj))

In general, R types can make it hard to have one and just one idiom for all types and uses.

一般来说,R 类型很难为所有类型和用途使用一种且只有一种习语。

回答by phonetagger

The OP (in the April 2012 updated revision of the question) is interested in knowing if there's a way to add to a list in amortized constant time, such as can be done, for example, with a C++ vector<>container. The best answer(s?) here so far only show the relative execution times for various solutions given a fixed-size problem, but do not address any of the various solutions' algorithmic efficiencydirectly. Comments below many of the answers discuss the algorithmic efficiency of some of the solutions, but in every case to date(as of April 2015) they come to the wrong conclusion.

OP(在 2012 年 4 月更新的问题修订版中)有兴趣知道是否有一种方法可以在摊销的常量时间内添加到列表中,例如可以使用 C++vector<>容器来完成。到目前为止,这里的最佳答案仅显示给定固定大小问题的各种解决方案的相对执行时间,但并未直接解决任何各种解决方案的算法效率。许多答案下方的评论讨论了一些解决方案的算法效率,但迄今为止(截至 2015 年 4 月)在每种情况下都得出了错误的结论。

Algorithmic efficiency captures the growth characteristics, either in time (execution time) or space (amount of memory consumed) as a problem size grows. Running a performance test for various solutions given a fixed-size problem does not address the various solutions' growth rate. The OP is interested in knowing if there is a way to append objects to an R list in "amortized constant time". What does that mean? To explain, first let me describe "constant time":

随着问题规模的增长,算法效率在时间(执行时间)或空间(消耗的内存量)上捕捉增长特征。在给定大小固定的问题的情况下,对各种解决方案运行性能测试并不能解决各种解决方案的增长率问题。OP 有兴趣知道是否有办法在“摊销常量时间”中将对象附加到 R 列表。这意味着什么?为了解释,首先让我描述“恒定时间”:

  • Constantor O(1)growth:

    If the time required to perform a given task remains the sameas the size of the problem doubles, then we say the algorithm exhibits constant timegrowth, or stated in "Big O" notation, exhibits O(1) time growth. When the OP says "amortized" constant time, he simply means "in the long run"... i.e., if performing a single operation occasionally takes much longer than normal (e.g. if a preallocated buffer is exhausted and occasionally requires resizing to a larger buffer size), as long as the long-term average performance is constant time, we'll still call it O(1).

    For comparison, I will also describe "linear time" and "quadratic time":

  • Linearor O(n)growth:

    If the time required to perform a given task doublesas the size of the problem doubles, then we say the algorithm exhibits linear time, or O(n)growth.

  • Quadraticor O(n2)growth:

    If the time required to perform a given task increases by the square of the problem size, them we say the algorithm exhibits quadratic time, or O(n2)growth.

  • 恒定O(1)增长:

    如果执行给定任务所需的时间保持不变,而问题的规模翻倍,那么我们说该算法表现出恒定的时间增长,或者用“大 O”表示法表示,表现出 O(1) 时间增长。当 OP 说“摊销”恒定时间时,他的意思只是“从长远来看”……即,如果执行单个操作偶尔比正常情况花费更长的时间(例如,如果预分配的缓冲区已用完并且偶尔需要将大小调整为更大的缓冲区大小),只要长期平均性能是恒定时间,我们仍将其称为 O(1)。

    为了比较,我还将描述“线性时间”和“二次时间”:

  • 线性O(n)增长:

    如果时间需要执行给定任务双打作为问题的大小加倍,那么我们说的算法表现出的线性时间,或者为O(n)的增长。

  • 二次O(n 2)增长:

    如果执行给定任务所需的时间增加了问题大小的平方,我们就说该算法表现出二次时间,或O(n 2)增长。

There are many other efficiency classes of algorithms; I defer to the Wikipedia articlefor further discussion.

还有许多其他效率等级的算法;我将参考维基百科文章进行进一步讨论。

I thank @CronAcronis for his answer, as I am new to R and it was nice to have a fully-constructed block of code for doing a performance analysis of the various solutions presented on this page. I am borrowing his code for my analysis, which I duplicate (wrapped in a function) below:

我感谢 @CronAcronis 的回答,因为我是 R 的新手,很高兴有一个完整构建的代码块来对本页上提供的各种解决方案进行性能分析。我借用他的代码进行分析,我在下面复制(包装在一个函数中):

library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} 
runBenchmark <- function(n) {
    microbenchmark(times = 5,  
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = { 
            a <- list(0)    
            for(i in 1:n) {a <- append(a, i)} 
            a
        },
        env_as_container_ = {
            listptr <- new.env(parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)} 
            listptr
        }   
    )
}

The results posted by @CronAcronis definitely seem to suggest that the a <- list(a, list(i))method is fastest, at least for a problem size of 10000, but the results for a single problem size do not address the growth of the solution. For that, we need to run a minimum of two profiling tests, with differing problem sizes:

@CronAcronis 发布的结果似乎确实表明该a <- list(a, list(i))方法最快,至少对于 10000 的问题大小,但单个问题大小的结果并不能解决解决方案的增长问题。为此,我们需要运行至少两个分析测试,具有不同的问题规模:

> runBenchmark(2e+3)
Unit: microseconds
              expr       min        lq      mean    median       uq       max neval
    env_with_list_  8712.146  9138.250 10185.533 10257.678 10761.33 12058.264     5
                c_ 13407.657 13413.739 13620.976 13605.696 13790.05 13887.738     5
             list_   854.110   913.407  1064.463   914.167  1301.50  1339.132     5
          by_index 11656.866 11705.140 12182.104 11997.446 12741.70 12809.363     5
           append_ 15986.712 16817.635 17409.391 17458.502 17480.55 19303.560     5
 env_as_container_ 19777.559 20401.702 20589.856 20606.961 20939.56 21223.502     5
> runBenchmark(2e+4)
Unit: milliseconds
              expr         min         lq        mean    median          uq         max neval
    env_with_list_  534.955014  550.57150  550.329366  553.5288  553.955246  558.636313     5
                c_ 1448.014870 1536.78905 1527.104276 1545.6449 1546.462877 1558.609706     5
             list_    8.746356    8.79615    9.162577    8.8315    9.601226    9.837655     5
          by_index  953.989076 1038.47864 1037.859367 1064.3942 1065.291678 1067.143200     5
           append_ 1634.151839 1682.94746 1681.948374 1689.7598 1696.198890 1706.683874     5
 env_as_container_  204.134468  205.35348  208.011525  206.4490  208.279580  215.841129     5
> 

First of all, a word about the min/lq/mean/median/uq/max values: Since we are performing the exact same task for each of 5 runs, in an ideal world, we could expect that it would take exactly the same amount of time for each run. But the first run is normally biased toward longer times due to the fact that the code we are testing is not yet loaded into the CPU's cache. Following the first run, we would expect the times to be fairly consistent, but occasionally our code may be evicted from the cache due to timer tick interrupts or other hardware interrupts that are unrelated to the code we are testing. By testing the code snippets 5 times, we are allowing the code to be loaded into the cache during the first run and then giving each snippet 4 chances to run to completion without interference from outside events. For this reason, and because we are really running the exact same code under the exact same input conditions each time, we will consider only the 'min' times to be sufficient for the best comparison between the various code options.

首先,关于 min/lq/mean/median/uq/max 值:由于我们对 5 次运行中的每一次执行完全相同的任务,在理想的世界中,我们可以预期它会完全相同每次运行的时间。但是由于我们正在测试的代码尚未加载到 CPU 的缓存中,因此第一次运行通常倾向于更长的时间。在第一次运行之后,我们希望时间相当一致,但有时我们的代码可能会由于计时器滴答中断或与我们正在测试的代码无关的其他硬件中断而从缓存中逐出。通过对代码片段进行 5 次测试,我们允许在第一次运行期间将代码加载到缓存中,然后让每个片段有 4 次机会在不受外部事件干扰的情况下运行完成。为此原因,

Note that I chose to first run with a problem size of 2000 and then 20000, so my problem size increased by a factor of 10 from the first run to the second.

请注意,我选择先以 2000 的问题大小运行,然后以 20000 的大小运行,因此从第一次运行到第二次运行,我的问题大小增加了 10 倍。

Performance of the listsolution: O(1) (constant time)

list解决方案的性能:O(1)(恒定时间)

Let's first look at the growth of the listsolution, since we can tell right away that it's the fastest solution in both profiling runs: In the first run, it took 854 microseconds (0.854 milliseconds) to perform 2000 "append" tasks. In the second run, it took 8.746 milliseconds to perform 20000 "append" tasks. A na?ve observer would say, "Ah, the listsolution exhibits O(n) growth, since as the problem size grew by a factor of ten, so did the time required to execute the test."The problem with that analysis is that what the OP wants is the growth rate of a single object insertion, not the growth rate of the overall problem. Knowing that, it's clear then that the listsolution provides exactly what the OP wants: a method of appending objects to a list in O(1) time.

让我们首先看看list解决方案的增长,因为我们可以立即看出它是两次分析运行中最快的解决方案:在第一次运行中,执行 2000 个“追加”任务需要 854微秒(0.854毫秒)。在第二次运行中,执行 20000 个“追加”任务需要 8.746 毫秒。一个天真的观察者会说,“啊,list解决方案表现出 O(n) 增长,因为随着问题规模增长了 10 倍,执行测试所需的时间也增长了。” 该分析的问题在于 OP 想要的是单个对象插入的增长率,而不是整个问题的增长率。知道这一点,很明显,list解决方案正是 OP 想要的:一种在 O(1) 时间内将对象附加到列表的方法。

Performance of the other solutions

其他解决方案的性能

None of the other solutions come even close to the speed of the listsolution, but it is informative to examine them anyway:

其他解决方案都没有接近解决方案的速度list,但无论如何检查它们是有益的:

Most of the other solutions appear to be O(n) in performance. For example, the by_indexsolution, a very popular solution based on the frequency with which I find it in other SO posts, took 11.6 milliseconds to append 2000 objects, and 953 milliseconds to append ten times that many objects. The overall problem's time grew by a factor of 100, so a na?ve observer might say "Ah, the by_indexsolution exhibits O(n2) growth, since as the problem size grew by a factor of ten, the time required to execute the test grew by a factor of 100."As before, this analysis is flawed, since the OP is interested in the growth of a single object insertion. If we divide the overall time growth by the problem's size growth, we find that the time growth of appending objects increased by a factor of only 10, not a factor of 100, which matches the growth of the problem size, so the by_indexsolution is O(n). There are no solutions listed which exhibit O(n2) growth for appending a single object.

大多数其他解决方案的性能似乎都是 O(n)。例如,by_index根据我在其他 SO 帖子中找到它的频率,该解决方案是一种非常流行的解决方案,追加 2000 个对象需要 11.6 毫秒,追加十倍的对象需要 953 毫秒。整个问题的时间增长了 100 倍,所以一个天真的观察者可能会说“啊,by_index解决方案表现出 O(n 2) 增长,因为随着问题规模增长了 10 倍,执行测试增加了 100 倍。”和以前一样,这种分析是有缺陷的,因为 OP 对单个对象插入的增长感兴趣。如果我们将整体时间增长除以问题的规模增长,我们发现追加对象的时间增长只增加了 10 倍,而不是 100 倍,这与问题规模的增长相匹配,所以by_index解是 O (n)。没有列出的解决方案显示 O(n 2) 增长以追加单个对象。

回答by JanKanis

In the other answers, only the listapproach results in O(1) appends, but it results in a deeply nested list structure, and not a plain single list. I have used the below datastructures, they supports O(1) (amortized) appends, and allow the result to be converted back to a plain list.

在其他答案中,只有该list方法会导致 O(1) 追加,但会导致深度嵌套的列表结构,而不是普通的单个列表。我使用了以下数据结构,它们支持 O(1)(摊销)附加,并允许将结果转换回普通列表。

expandingList <- function(capacity = 10) {
    buffer <- vector('list', capacity)
    length <- 0

    methods <- list()

    methods$double.size <- function() {
        buffer <<- c(buffer, vector('list', capacity))
        capacity <<- capacity * 2
    }

    methods$add <- function(val) {
        if(length == capacity) {
            methods$double.size()
        }

        length <<- length + 1
        buffer[[length]] <<- val
    }

    methods$as.list <- function() {
        b <- buffer[0:length]
        return(b)
    }

    methods
}

and

linkedList <- function() {
    head <- list(0)
    length <- 0

    methods <- list()

    methods$add <- function(val) {
        length <<- length + 1
        head <<- list(head, val)
    }

    methods$as.list <- function() {
        b <- vector('list', length)
        h <- head
        for(i in length:1) {
            b[[i]] <- head[[2]]
            head <- head[[1]]
        }
        return(b)
    }
    methods
}

Use them as follows:

按如下方式使用它们:

> l <- expandingList()
> l$add("hello")
> l$add("world")
> l$add(101)
> l$as.list()
[[1]]
[1] "hello"

[[2]]
[1] "world"

[[3]]
[1] 101

These solutions could be expanded into full objects that support al list-related operations by themselves, but that will remain as an exercise for the reader.

这些解决方案可以扩展为自己支持所有与列表相关的操作的完整对象,但这仍将作为读者的练习。

Another variant for a named list:

命名列表的另一个变体:

namedExpandingList <- function(capacity = 10) {
    buffer <- vector('list', capacity)
    names <- character(capacity)
    length <- 0

    methods <- list()

    methods$double.size <- function() {
        buffer <<- c(buffer, vector('list', capacity))
        names <<- c(names, character(capacity))
        capacity <<- capacity * 2
    }

    methods$add <- function(name, val) {
        if(length == capacity) {
            methods$double.size()
        }

        length <<- length + 1
        buffer[[length]] <<- val
        names[length] <<- name
    }

    methods$as.list <- function() {
        b <- buffer[0:length]
        names(b) <- names[0:length]
        return(b)
    }

    methods
}

Benchmarks

基准

Performance comparison using @phonetagger's code (which is based on @Cron Arconis' code). I have also added a better_env_as_containerand changed the env_as_container_a bit. The original env_as_container_was broken and doesn't actually store all the numbers.

使用@phonetagger 的代码(基于@Cron Arconis 的代码)进行性能比较。我还添加了一个better_env_as_container并更改env_as_container_了一点。原件env_as_container_坏了,实际上并没有存储所有数字。

library(microbenchmark)
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(lab)]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} 
env2list <- function(env, len) {
    l <- vector('list', len)
    for (i in 1:len) {
        l[[i]] <- env[[as.character(i)]]
    }
    l
}
envl2list <- function(env, len) {
    l <- vector('list', len)
    for (i in 1:len) {
        l[[i]] <- env[[paste(as.character(i), 'L', sep='')]]
    }
    l
}
runBenchmark <- function(n) {
    microbenchmark(times = 5,  
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = { 
            a <- list(0)    
            for(i in 1:n) {a <- append(a, i)} 
            a
        },
        env_as_container_ = {
            listptr <- new.env(hash=TRUE, parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)} 
            envl2list(listptr, n)
        },
        better_env_as_container = {
            env <- new.env(hash=TRUE, parent=globalenv())
            for(i in 1:n) env[[as.character(i)]] <- i
            env2list(env, n)
        },
        linkedList = {
            a <- linkedList()
            for(i in 1:n) { a$add(i) }
            a$as.list()
        },
        inlineLinkedList = {
            a <- list()
            for(i in 1:n) { a <- list(a, i) }
            b <- vector('list', n)
            head <- a
            for(i in n:1) {
                b[[i]] <- head[[2]]
                head <- head[[1]]
            }                
        },
        expandingList = {
            a <- expandingList()
            for(i in 1:n) { a$add(i) }
            a$as.list()
        },
        inlineExpandingList = {
            l <- vector('list', 10)
            cap <- 10
            len <- 0
            for(i in 1:n) {
                if(len == cap) {
                    l <- c(l, vector('list', cap))
                    cap <- cap*2
                }
                len <- len + 1
                l[[len]] <- i
            }
            l[1:len]
        }
    )
}

# We need to repeatedly add an element to a list. With normal list concatenation
# or element setting this would lead to a large number of memory copies and a
# quadratic runtime. To prevent that, this function implements a bare bones
# expanding array, in which list appends are (amortized) constant time.
    expandingList <- function(capacity = 10) {
        buffer <- vector('list', capacity)
        length <- 0

        methods <- list()

        methods$double.size <- function() {
            buffer <<- c(buffer, vector('list', capacity))
            capacity <<- capacity * 2
        }

        methods$add <- function(val) {
            if(length == capacity) {
                methods$double.size()
            }

            length <<- length + 1
            buffer[[length]] <<- val
        }

        methods$as.list <- function() {
            b <- buffer[0:length]
            return(b)
        }

        methods
    }

    linkedList <- function() {
        head <- list(0)
        length <- 0

        methods <- list()

        methods$add <- function(val) {
            length <<- length + 1
            head <<- list(head, val)
        }

        methods$as.list <- function() {
            b <- vector('list', length)
            h <- head
            for(i in length:1) {
                b[[i]] <- head[[2]]
                head <- head[[1]]
            }
            return(b)
        }

        methods
    }

# We need to repeatedly add an element to a list. With normal list concatenation
# or element setting this would lead to a large number of memory copies and a
# quadratic runtime. To prevent that, this function implements a bare bones
# expanding array, in which list appends are (amortized) constant time.
    namedExpandingList <- function(capacity = 10) {
        buffer <- vector('list', capacity)
        names <- character(capacity)
        length <- 0

        methods <- list()

        methods$double.size <- function() {
            buffer <<- c(buffer, vector('list', capacity))
            names <<- c(names, character(capacity))
            capacity <<- capacity * 2
        }

        methods$add <- function(name, val) {
            if(length == capacity) {
                methods$double.size()
            }

            length <<- length + 1
            buffer[[length]] <<- val
            names[length] <<- name
        }

        methods$as.list <- function() {
            b <- buffer[0:length]
            names(b) <- names[0:length]
            return(b)
        }

        methods
    }

result:

结果:

> runBenchmark(1000)
Unit: microseconds
                    expr       min        lq      mean    median        uq       max neval
          env_with_list_  3128.291  3161.675  4466.726  3361.837  3362.885  9318.943     5
                      c_  3308.130  3465.830  6687.985  8578.913  8627.802  9459.252     5
                   list_   329.508   343.615   389.724   370.504   449.494   455.499     5
                by_index  3076.679  3256.588  5480.571  3395.919  8209.738  9463.931     5
                 append_  4292.321  4562.184  7911.882 10156.957 10202.773 10345.177     5
       env_as_container_ 24471.511 24795.849 25541.103 25486.362 26440.591 26511.200     5
 better_env_as_container  7671.338  7986.597  8118.163  8153.726  8335.659  8443.493     5
              linkedList  1700.754  1755.439  1829.442  1804.746  1898.752  1987.518     5
        inlineLinkedList  1109.764  1115.352  1163.751  1115.631  1206.843  1271.166     5
           expandingList  1422.440  1439.970  1486.288  1519.728  1524.268  1525.036     5
     inlineExpandingList   942.916   973.366  1002.461  1012.197  1017.784  1066.044     5
> runBenchmark(10000)
Unit: milliseconds
                    expr        min         lq       mean     median         uq        max neval
          env_with_list_ 357.760419 360.277117 433.810432 411.144799 479.090688 560.779139     5
                      c_ 685.477809 734.055635 761.689936 745.957553 778.330873 864.627811     5
                   list_   3.257356   3.454166   3.505653   3.524216   3.551454   3.741071     5
                by_index 445.977967 454.321797 515.453906 483.313516 560.374763 633.281485     5
                 append_ 610.777866 629.547539 681.145751 640.936898 760.570326 763.896124     5
       env_as_container_ 281.025606 290.028380 303.885130 308.594676 314.972570 324.804419     5
 better_env_as_container  83.944855  86.927458  90.098644  91.335853  92.459026  95.826030     5
              linkedList  19.612576  24.032285  24.229808  25.461429  25.819151  26.223597     5
        inlineLinkedList  11.126970  11.768524  12.216284  12.063529  12.392199  13.730200     5
           expandingList  14.735483  15.854536  15.764204  16.073485  16.075789  16.081726     5
     inlineExpandingList  10.618393  11.179351  13.275107  12.391780  14.747914  17.438096     5
> runBenchmark(20000)
Unit: milliseconds
                    expr         min          lq       mean      median          uq         max neval
          env_with_list_ 1723.899913 1915.003237 1921.23955 1938.734718 1951.649113 2076.910767     5
                      c_ 2759.769353 2768.992334 2810.40023 2820.129738 2832.350269 2870.759474     5
                   list_    6.112919    6.399964    6.63974    6.453252    6.910916    7.321647     5
                by_index 2163.585192 2194.892470 2292.61011 2209.889015 2436.620081 2458.063801     5
                 append_ 2832.504964 2872.559609 2983.17666 2992.634568 3004.625953 3213.558197     5
       env_as_container_  573.386166  588.448990  602.48829  597.645221  610.048314  642.912752     5
 better_env_as_container  154.180531  175.254307  180.26689  177.027204  188.642219  206.230191     5
              linkedList   38.401105   47.514506   46.61419   47.525192   48.677209   50.952958     5
        inlineLinkedList   25.172429   26.326681   32.33312   34.403442   34.469930   41.293126     5
           expandingList   30.776072   30.970438   34.45491   31.752790   38.062728   40.712542     5
     inlineExpandingList   21.309278   22.709159   24.64656   24.290694   25.764816   29.158849     5

I have added linkedListand expandingListand an inlined version of both. The inlinedLinkedListis basically a copy of list_, but it also converts the nested structure back into a plain list. Beyond that the difference between the inlined and non-inlined versions is due to the overhead of the function calls.

我已经添加了linkedListexpandingList和两者的内联版本。的inlinedLinkedList基本上是一个副本list_,但它也转换嵌套结构回一个普通的列表。除此之外,内联和非内联版本之间的差异是由于函数调用的开销。

All variants of expandingListand linkedListshow O(1) append performance, with the benchmark time scaling linearly with the number of items appended. linkedListis slower than expandingList, and the function call overhead is also visible. So if you really need all the speed you can get (and want to stick to R code), use an inlined version of expandingList.

的所有变体expandingListlinkedList表演O(1)追加的性能,与基准时间与附加的项目数线性扩展。linkedList比 慢expandingList,函数调用开销也可见。因此,如果您真的需要可以获得的所有速度(并希望坚持使用 R 代码),请使用expandingList.

I've also had a look at the C implementation of R, and both approaches should be O(1) append for any size up until you run out of memory.

我还查看了 R 的 C 实现,两种方法都应该是 O(1) 附加任何大小,直到内存不足为止。

I have also changed env_as_container_, the original version would store every item under index "i", overwriting the previously appended item. The better_env_as_containerI have added is very similar to env_as_container_but without the deparsestuff. Both exhibit O(1) performance, but they have an overhead that is quite a bit larger than the linked/expanding lists.

我也改变了env_as_container_,原始版本会将每个项目存储在索引“i”下,覆盖之前附加的项目。在better_env_as_container我加入非常相似,env_as_container_但没有deparse东西。两者都表现出 O(1) 性能,但它们的开销比链接/扩展列表大得多。

Memory overhead

内存开销

In the C R implementation there is an overhead of 4 words and 2 ints per allocated object. The linkedListapproach allocates one list of length two per append, for a total of (4*8+4+4+2*8=) 56 bytes per appended item on 64-bit computers (excluding memory allocation overhead, so probably closer to 64 bytes). The expandingListapproach uses one word per appended item, plus a copy when doubling the vector length, so a total memory usage of up to 16 bytes per item. Since the memory is all in one or two objects the per-object overhead is insignificant. I haven't looked deeply into the envmemory usage, but I think it will be closer to linkedList.

在 CR 实现中,每个分配的对象有 4 个字和 2 个整数的开销。该linkedList方法为每个附加分配一个长度为 2 的列表,对于 64 位计算机上的每个附加项目总共 (4*8+4+4+2*8=) 56 个字节(不包括内存分配开销,因此可能更接近 64字节)。该expandingList方法为每个附加项目使用一个单词,当向量长度加倍时加上一个副本,因此每个项目的总内存使用量最多为 16 字节。由于内存都在一个或两个对象中,因此每个对象的开销是微不足道的。我没有深入研究env内存使用情况,但我认为它会更接近linkedList.

回答by Arseny

In the Lisp we did it this way:

在 Lisp 中,我们是这样做的:

> l <- c(1)
> l <- c(2, l)
> l <- c(3, l)
> l <- rev(l)
> l
[1] 1 2 3

though it was 'cons', not just 'c'. If you need to start with an empy list, use l <- NULL.

尽管它是“缺点”,而不仅仅是“c”。如果您需要从空列表开始,请使用 l <- NULL。

回答by Ken Williams

You want something like this maybe?

也许你想要这样的东西?

> push <- function(l, x) {
   lst <- get(l, parent.frame())
   lst[length(lst)+1] <- x
   assign(l, lst, envir=parent.frame())
 }
> a <- list(1,2)
> push('a', 6)
> a
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 6

It's not a very polite function (assigning to parent.frame()is kind of rude) but IIUYC it's what you're asking for.

这不是一个非常有礼貌的功能(分配给parent.frame()有点粗鲁)但是 IIUYC 这就是你所要求的。

回答by Cron Merdek

I have made a small comparison of methods mentioned here.

我对这里提到的方法做了一个小的比较。

n = 1e+4
library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} 

microbenchmark(times = 5,  
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = { 
            a <- list(0)    
            for(i in 1:n) {a <- append(a, i)} 
            a
        },
        env_as_container_ = {
            listptr <- new.env(parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)} 
            listptr
        }   
)

Results:

结果:

Unit: milliseconds
              expr       min        lq       mean    median        uq       max neval cld
    env_with_list_  188.9023  198.7560  224.57632  223.2520  229.3854  282.5859     5  a 
                c_ 1275.3424 1869.1064 2022.20984 2191.7745 2283.1199 2491.7060     5   b
             list_   17.4916   18.1142   22.56752   19.8546   20.8191   36.5581     5  a 
          by_index  445.2970  479.9670  540.20398  576.9037  591.2366  607.6156     5  a 
           append_ 1140.8975 1316.3031 1794.10472 1620.1212 1855.3602 3037.8416     5   b
 env_as_container_  355.9655  360.1738  399.69186  376.8588  391.7945  513.6667     5  a 

回答by ayman

If you pass in the list variable as a quoted string, you can reach it from within the function like:

如果将列表变量作为带引号的字符串传入,则可以从函数内部访问它,例如:

push <- function(l, x) {
  assign(l, append(eval(as.name(l)), x), envir=parent.frame())
}

so:

所以:

> a <- list(1,2)
> a
[[1]]
[1] 1

[[2]]
[1] 2

> push("a", 3)
> a
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

> 

or for extra credit:

或额外学分:

> v <- vector()
> push("v", 1)
> v
[1] 1
> push("v", 2)
> v
[1] 1 2
> 

回答by Paul

Not sure why you don't think your first method won't work. You have a bug in the lappend function: length(list) should be length(lst). This works fine and returns a list with the appended obj.

不知道为什么你不认为你的第一种方法行不通。您在 lappend 函数中有一个错误:length(list) 应该是 length(lst)。这工作正常并返回带有附加 obj 的列表。

回答by Michele

try this function lappend

试试这个函数 lappend

lappend <- function (lst, ...){
  lst <- c(lst, list(...))
  return(lst)
}

and other suggestions from this page Add named vector to a list

以及来自此页面的其他建议将命名向量添加到列表中

Bye.

再见。

回答by David Bellot

in fact there is a subtelty with the c()function. If you do:

事实上,该c()功能有一个微妙之处。如果你这样做:

x <- list()
x <- c(x,2)
x = c(x,"foo")

you will obtain as expected:

您将按预期获得:

[[1]]
[1]

[[2]]
[1] "foo"

but if you add a matrix with x <- c(x, matrix(5,2,2), your list will have another 4 elements of value 5! You would better do:

但是如果你添加一个矩阵x <- c(x, matrix(5,2,2),你的列表将有另外 4 个元素的值5!你最好这样做:

x <- c(x, list(matrix(5,2,2))

It works for any other object and you will obtain as expected:

它适用于任何其他对象,您将按预期获得:

[[1]]
[1]

[[2]]
[1] "foo"

[[3]]
     [,1] [,2]
[1,]    5    5
[2,]    5    5

Finally, your function becomes:

最后,您的功能变为:

push <- function(l, ...) c(l, list(...))

and it works for any type of object. You can be smarter and do:

它适用于任何类型的对象。你可以更聪明地做:

push_back <- function(l, ...) c(l, list(...))
push_front <- function(l, ...) c(list(...), l)