list 在列表中获取匹配索引的快速方法

Question

提问by ThomasP85

Given a list acontaining vectors of unequal length and a vector bcontaining some elements from the vectors in a, I want to get a vector of equal length to bcontaining the index in awhere the element in bmatches (this is a bad explanation I know)...

给定一个a包含不等长向量的列表和一个b包含来自中的向量的一些元素的向量a，我想得到一个等长的向量来b包含匹配中a元素的索引b（我知道这是一个糟糕的解释）...

The following code does the job:

以下代码完成了这项工作：

a <- list(1:3, 4:5, 6:9)
b <- c(2, 3, 5, 8)

sapply(b, function(x, list) which(unlist(lapply(list, function(y, z) z %in% y, z=x))), list=a)
[1] 1 1 2 3

Replacing the sapplywith a for loop achieves the same of course

sapply用 for 循环替换当然也能达到同样的效果

The problem is that this code will be used with list and vectors with a length above 1000. On a real life set the function takes around 15 seconds (both the for loop and the sapply).

问题是此代码将与长度超过 1000 的列表和向量一起使用。在现实生活中，该函数需要大约 15 秒（for 循环和sapply）。

Does anyone have an idea how to speed this up, safe for a parallel approach? I have failed to see a vectorized approach (and I cannot program in C, though that would probably be the fastest).

有没有人知道如何加快速度，对于并行方法是安全的？我没有看到矢量化方法（我不能用 C 编程，尽管这可能是最快的）。

Edit:

编辑：

Will just emphasize Aaron's elegant solution using match() which gave a speed increase in the order of 1667 times (from 15 to 0.009)

将使用 match() 强调 Aaron 的优雅解决方案，它使速度提高了 1667 倍（从 15 到 0.009）

I expanded a bit on it to allow multiple matches (the return is then a list)

我对其进行了扩展以允许多个匹配项（然后返回一个列表）

a <- list(1:3, 3:5, 3:7)
b <- c(3, 5)
g <- rep(seq_along(a), sapply(a, length))
sapply(b, function(x) g[which(unlist(a) %in% x)])
[[1]]
[1] 1 2 3

[[2]]
[1] 2 3

The runtime for this was 0.169 which is arguably quite slower, but on the other hand more flexible

这个的运行时间是 0.169，这可以说是相当慢，但另一方面更灵活

Answer 1

回答by Aaron left Stack Overflow

Here's one possibility using match:

这是使用的一种可能性match：

> a <- list(1:3, 4:5, 6:9)
> b <- c(2, 3, 5, 8)
> g <- rep(seq_along(a), sapply(a, length))
> g[match(b, unlist(a))]
[1] 1 1 2 3

findIntervalis another option:

findInterval是另一种选择：

> findInterval(match(b, unlist(a)), cumsum(c(0,sapply(a, length)))+1)
[1] 1 1 2 3

For returning a list, try this:

要返回列表，请尝试以下操作：

a <- list(1:3, 4:5, 5:9)
b <- c(2,3,5,8,5)
g <- rep(seq_along(a), sapply(a, length))
aa <- unlist(a)
au <- unique(aa)
af <- factor(aa, levels=au)
gg <- split(g, af)
gg[match(b, au)]

Answer 2

回答by ALiX

As a comment to your post suggests, it depends on what you want to do if/when the same element appears in multiple vectors in a. Assuming that you want the lowest index you could do:

正如对您帖子的评论所暗示的那样，如果/当相同的元素出现在a. 假设你想要你可以做的最低索引：

apply(sapply(a, function(vec) {b %in% vec}), 1, which.max)

list 在列表中获取匹配索引的快速方法

提问by ThomasP85

回答by Aaron left Stack Overflow

回答by ALiX

相关推荐

最近更新

标签

list 在列表中获取匹配索引的快速方法

提问by ThomasP85

回答by Aaron left Stack Overflow

回答by ALiX

相关推荐

list R 中的向量和列表数据类型有什么区别？

list 使用 EL 获取列表或数组中的特定元素

list 列出文件夹中的子文件夹 - Matlab（仅子文件夹，而不是文件）

list 如何找到 Homebrew 的可安装软件包列表？

相关推荐

最近更新

标签