list 在列表中获取匹配索引的快速方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11002391/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 01:51:17  来源:igfitidea点击:

Fast way of getting index of match in list

rlistoptimizationindexingvectorization

提问by ThomasP85

Given a list acontaining vectors of unequal length and a vector bcontaining some elements from the vectors in a, I want to get a vector of equal length to bcontaining the index in awhere the element in bmatches (this is a bad explanation I know)...

给定一个a包含不等长向量的列表和一个b包含来自 中的向量的一些元素的向量a,我想得到一个等长的向量来b包含匹配中a元素的索引b(我知道这是一个糟糕的解释)...

The following code does the job:

以下代码完成了这项工作:

a <- list(1:3, 4:5, 6:9)
b <- c(2, 3, 5, 8)

sapply(b, function(x, list) which(unlist(lapply(list, function(y, z) z %in% y, z=x))), list=a)
[1] 1 1 2 3

Replacing the sapplywith a for loop achieves the same of course

sapply用 for 循环替换当然也能达到同样的效果

The problem is that this code will be used with list and vectors with a length above 1000. On a real life set the function takes around 15 seconds (both the for loop and the sapply).

问题是此代码将与长度超过 1000 的列表和向量一起使用。在现实生活中,该函数需要大约 15 秒(for 循环和sapply)。

Does anyone have an idea how to speed this up, safe for a parallel approach? I have failed to see a vectorized approach (and I cannot program in C, though that would probably be the fastest).

有没有人知道如何加快速度,对于并行方法是安全的?我没有看到矢量化方法(我不能用 C 编程,尽管这可能是最快的)。

Edit:

编辑:

Will just emphasize Aaron's elegant solution using match() which gave a speed increase in the order of 1667 times (from 15 to 0.009)

将使用 match() 强调 Aaron 的优雅解决方案,它使速度提高了 1667 倍(从 15 到 0.009)

I expanded a bit on it to allow multiple matches (the return is then a list)

我对其进行了扩展以允许多个匹配项(然后返回一个列表)

a <- list(1:3, 3:5, 3:7)
b <- c(3, 5)
g <- rep(seq_along(a), sapply(a, length))
sapply(b, function(x) g[which(unlist(a) %in% x)])
[[1]]
[1] 1 2 3

[[2]]
[1] 2 3

The runtime for this was 0.169 which is arguably quite slower, but on the other hand more flexible

这个的运行时间是 0.169,这可以说是相当慢,但另一方面更灵活

回答by Aaron left Stack Overflow

Here's one possibility using match:

这是使用的一种可能性match

> a <- list(1:3, 4:5, 6:9)
> b <- c(2, 3, 5, 8)
> g <- rep(seq_along(a), sapply(a, length))
> g[match(b, unlist(a))]
[1] 1 1 2 3

findIntervalis another option:

findInterval是另一种选择:

> findInterval(match(b, unlist(a)), cumsum(c(0,sapply(a, length)))+1)
[1] 1 1 2 3

For returning a list, try this:

要返回列表,请尝试以下操作:

a <- list(1:3, 4:5, 5:9)
b <- c(2,3,5,8,5)
g <- rep(seq_along(a), sapply(a, length))
aa <- unlist(a)
au <- unique(aa)
af <- factor(aa, levels=au)
gg <- split(g, af)
gg[match(b, au)]

回答by ALiX

As a comment to your post suggests, it depends on what you want to do if/when the same element appears in multiple vectors in a. Assuming that you want the lowest index you could do:

正如对您帖子的评论所暗示的那样,如果/当相同的元素出现在a. 假设你想要你可以做的最低索引:

apply(sapply(a, function(vec) {b %in% vec}), 1, which.max)