list 如何避免 R 中的循环:从列表中选择项目

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1355355/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 01:23:11  来源:igfitidea点击:

How to avoid a loop in R: selecting items from a list

listrvectorstrsplit

提问by JD Long

I could solve this using loops, but I am trying think in vectors so my code will be more R-esque.

我可以使用循环来解决这个问题,但我尝试在向量中思考,因此我的代码将更加 R 风格。

I have a list of names. The format is firstname_lastname. I want to get out of this list a separate list with only the first names. I can't seem to get my mind around how to do this. Here's some example data:

我有一个名字列表。格式为名字_姓氏。我想从这个列表中删除一个只有名字的单独列表。我似乎无法理解如何做到这一点。以下是一些示例数据:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
tsplit <- strsplit(t,"_")

which looks like this:

看起来像这样:

> tsplit
[[1]]
[1] "bob"   "smith"

[[2]]
[1] "mary" "jane"

[[3]]
[1] "jose"  "chung"

[[4]]
[1] "michael" "marx"   

[[5]]
[1] "charlie" "ivan"   

I could get out what I want using loops like this:

我可以使用这样的循环得到我想要的:

for (i in 1:length(tsplit)){
    if (i==1) {t_out <- tsplit[[i]][1]} else{t_out <- append(t_out, tsplit[[i]][1])} 
}

which would give me this:

这会给我这个:

t_out
[1] "bob"     "mary"    "jose"    "michael" "charlie"

So how can I do this without loops?

那么我怎么能在没有循环的情况下做到这一点呢?

采纳答案by liebke

You can use apply(or sapply)

您可以使用apply(或sapply

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
f <- function(s) strsplit(s, "_")[[1]][1]
sapply(t, f)

bob_smith    mary_jane   jose_chung michael_marx charlie_ivan 

       "bob"       "mary"       "jose"    "michael"    "charlie" 

See: A brief introduction to “apply” in R

请参阅:R 中“应用”的简要介绍

回答by hadley

And one more approach:

还有一种方法:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
pieces <- strsplit(t,"_")
sapply(pieces, "[", 1)

In words, the last line extracts the first element of each component of the list and then simplifies it into a vector.

换句话说,最后一行提取列表每个组件的第一个元素,然后将其简化为向量。

How does this work? Well, you need to realise an alternative way of writing x[1]is "["(x, 1), i.e. there is a function called [that does subsetting. The sapplycall applies calls this function once for each element of the original list, passing in two arguments, the list element and 1.

这是如何运作的?好吧,您需要实现另一种写法x[1]"["(x, 1),即有一个称为[子集的函数。该sapply调用对原始列表的每个元素调用一次此函数,传入两个参数,列表元素和 1。

The advantage of this approach over the others is that you can extract multiple elements from the list without having to recompute the splits. For example, the last name would be sapply(pieces, "[", 2). Once you get used to this idiom, it's pretty easy to read.

与其他方法相比,这种方法的优势在于您可以从列表中提取多个元素,而无需重新计算拆分。例如,姓氏将是sapply(pieces, "[", 2)。一旦你习惯了这个习语,它就很容易阅读。

回答by William Doane

How about:

怎么样:

tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
fnames <- gsub("(_.*)$", "", tlist)
# _.* matches the underscore followed by a string of characters
# the $ anchors the search at the end of the input string
# so, underscore followed by a string of characters followed by the end of the input string

tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
fnames <- gsub("(_.*)$", "", tlist)
# _.* matches the underscore followed by a string of characters
# the $ anchors the search at the end of the input string
# so, underscore followed by a string of characters followed by the end of the input string

for the RegEx approach?

对于 RegEx 方法?

回答by Karsten

what about:

关于什么:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")

sub("_.*", "", t)

回答by Matt Parker

I doubt this is the most elegant solution, but it beats looping:

我怀疑这是最优雅的解决方案,但它胜过循环:

t.df <- data.frame(tsplit)
t.df[1, ]

Converting lists to data frames is about the only way I can get them to do what I want. I'm looking forward to reading answers by people who actually understand how to handle lists.

将列表转换为数据框是我让它们做我想做的唯一方法。我期待阅读真正了解如何处理列表的人的答案。

回答by Dirk Eddelbuettel

You almost had it. It reallyis just a matter of

你几乎拥有它。这真的只是一个问题

  1. using one of the *applyfunctions to loop over your existing list, I often start with lapplyand sometimes switch to sapply
  2. add an anonymous function that operates on one of the list elements at a time
  3. you already knew it was strsplit(string, splitterm)and that you need the odd [[1]][1]to pick off the first term of the answer
  4. just put it all together, starting with a preferred variable namne (as we stay clear of tor cand friends)
  1. 使用其中一个*apply函数循环遍历现有列表,我经常开始lapply有时会切换到sapply
  2. 添加一次对列表元素之一进行操作的匿名函数
  3. 你已经知道它是strsplit(string, splitterm),你需要奇数[[1]][1]来挑选答案的第一项
  4. 只需将它们放在一起,从首选变量 namne 开始(因为我们远离torc和朋友)

which gives

这使

> tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan") 
> fnames <- sapply(tlist, function(x) strsplit(x, "_")[[1]][1]) 
> fnames 
  bob_smith    mary_jane   jose_chung michael_marx charlie_ivan   
      "bob"       "mary"       "jose"    "michael"    "charlie" 
>

回答by brentonk

You could use unlist():

你可以使用unlist()

> tsplit <- unlist(strsplit(t,"_"))
> tsplit
 [1] "bob"     "smith"   "mary"    "jane"    "jose"    "chung"   "michael"
 [8] "marx"    "charlie" "ivan"   
> t_out <- tsplit[seq(1, length(tsplit), by = 2)]
> t_out
[1] "bob"     "mary"    "jose"    "michael" "charlie"

There might be a better way to pull out only the odd-indexed entries, but in any case you won't have a loop.

可能有更好的方法来仅提取奇数索引条目,但在任何情况下都不会出现循环。

回答by William Doane

And one other approach, based on brentonk's unlist example...

还有另一种方法,基于 brentonk 的 unlist 示例......

tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
tsplit <- unlist(strsplit(tlist,"_"))
fnames <- tsplit[seq(1:length(tsplit))%%2 == 1]

tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
tsplit <- unlist(strsplit(tlist,"_"))
fnames <- tsplit[seq(1:length(tsplit))%%2 == 1]

回答by jmc200

I would use the following unlist()-based method:

我将使用以下基于 unlist() 的方法:

> t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
> tsplit <- strsplit(t,"_")
> 
> x <- matrix(unlist(tsplit), 2)
> x[1,]
[1] "bob"     "mary"    "jose"    "michael" "charlie"

The big advantage of this method is that it solves the equivalent problem for surnames at the same time:

这种方法的一大优点是它同时解决了姓氏的等价问题:

> x[2,]
[1] "smith" "jane"  "chung" "marx"  "ivan" 

The downside is that you'll need to be certain that all of the names conform to the firstname_lastnamestructure; if any don't then this method will break.

缺点是您需要确定所有名称都符合firstname_lastname结构;如果没有,则此方法将中断。

回答by Virginie

from the original tsplitlist object given at the beginning, this command will do:

tsplit开头给出的原始列表对象,此命令将执行以下操作:

unlist(lapply(tsplit,function(x) x[1]))

it extracts the first element of all list elements, then transforms a list to a vector. Unlisting first to a matrix, then extracting the fist column is also ok, but then you are dependent on the fact that all list elements have the same length. Here is the output:

它提取所有列表元素的第一个元素,然后将列表转换为向量。首先取消列出矩阵,然后提取第一列也可以,但是您依赖于所有列表元素都具有相同长度的事实。这是输出:

> tsplit

[[1]]
[1] "bob"   "smith"

[[2]]
[1] "mary" "jane"

[[3]]
[1] "jose"  "chung"

[[4]]
[1] "michael" "marx"   

[[5]]
[1] "charlie" "ivan"   

> lapply(tsplit,function(x) x[1])

[[1]]
[1] "bob"

[[2]]
[1] "mary"

[[3]]
[1] "jose"

[[4]]
[1] "michael"

[[5]]
[1] "charlie"

> unlist(lapply(tsplit,function(x) x[1]))

[1] "bob"     "mary"    "jose"    "michael" "charlie"