string 如何在公式中通过字符串使用引用变量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17024685/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 01:58:06  来源:igfitidea点击:

How to use reference variables by character string in a formula?

rstringformulapastenames

提问by Eric Green

In the minimal example below, I am trying to use the values of a character string varsin a regression formula. However, I am only able to pass the string of variable names ("v2+v3+v4") to the formula, not the real meaning of this string (e.g., "v2" is dat$v2).

在下面的最小示例中,我试图vars在回归公式中使用字符串的值。但是,我只能将变量名称字符串(“v2+v3+v4”)传递给公式,而不是该字符串的真正含义(例如,“v2”是 dat$v2)。

I know there are better ways to run the regression (e.g., lm(v1 ~ v2 + v3 + v4, data=dat)). My situation is more complex, and I am trying to figure out how to use a character string in a formula. Any thoughts?

我知道有更好的方法来运行回归(例如,lm(v1 ~ v2 + v3 + v4, data=dat))。我的情况比较复杂,我想弄清楚如何在公式中使用字符串。有什么想法吗?

Updated below code

更新了下面的代码

# minimal example 
# create data frame
v1 <- rnorm(10)
v2 <- sample(c(0,1), 10, replace=TRUE)
v3 <- rnorm(10)
v4 <- rnorm(10)
dat <- cbind(v1, v2, v3, v4)
dat <- as.data.frame(dat)

# create objects of column names
c.2 <- colnames(dat)[2]
c.3 <- colnames(dat)[3]
c.4 <- colnames(dat)[4]

# shortcut to get to the type of object my full code produces
vars <- paste(c.2, c.3, c.4, sep="+")

### TRYING TO SOLVE FROM THIS POINT:
print(vars)
# [1] "v2+v3+v4"

# use vars in regression
regression <- paste0("v1", " ~ ", vars)
m1 <- lm(as.formula(regression), data=dat)

Update: @Arun was correct about the missing "" on v1in the first example. This fixed my example, but I was still having problems with my real code. In the code chunk below, I adapted my example to better reflect my actual code. I chose to create a simpler example at first thinking that the problem was the string vars.

更新:@Arun 关于v1第一个示例中缺少的 "" 是正确的。这修复了我的示例,但我的真实代码仍然存在问题。在下面的代码块中,我修改了我的示例以更好地反映我的实际代码。我选择创建一个更简单的示例,起初认为问题出在 string 上vars

Here's an example that does not work :) Uses the same data frame datcreated above.

这是一个不起作用的示例:) 使用dat上面创建的相同数据框。

dv <- colnames(dat)[1]
r2 <- colnames(dat)[2]
# the following loop creates objects r3, r4, r5, and r6
# r5 and r6 are interaction terms
for (v in 3:4) {
  r <- colnames(dat)[v]
  assign(paste("r",v,sep=""),r)
  r <- paste(colnames(dat)[2], colnames(dat)[v], sep="*")
  assign(paste("r",v+2,sep=""),r)
}

# combine r3, r4, r5, and r6 then collapse and remove trailing +
vars2 <- sapply(3:6, function(i) { 
                paste0("r", i, "+")
                })
vars2 <- paste(vars2, collapse = '')
vars2 <- substr(vars2, 1, nchar(vars2)-1)

# concatenate dv, r2 (as a factor), and vars into `eq`
eq <- paste0(dv, " ~ factor(",r2,") +", vars2)

Here is the issue:

这是问题:

print(eq)
# [1] "v1 ~ factor(v2) +r3+r4+r5+r6"

Unlike regressionin the first example, eqdoes not bring in the column names (e.g., v3). The object names (e.g., r3) are retained. As such, the following lm()command does not work.

regression第一个示例不同,eq不引入列名(例如,v3)。对象名称(例如,r3)被保留。因此,以下lm()命令不起作用。

m2 <- lm(as.formula(eq), data=dat)

回答by Aaron left Stack Overflow

I see a couple issues going on here. First, and I don't think this is causing any trouble, but let's make your data frame in one step so you don't have v1through v4floating around both in the global environment as well as in the data frame. Second, let's just make v2a factor here so that we won't have to deal with making it a factor later.

我看到这里有几个问题。首先,我不认为这是造成任何麻烦,但让我们一步到位您的数据帧所以你没有v1通过v4漂浮在全局环境以及数据帧。其次,让v2我们在这里做一个因素,这样我们就不必在以后处理把它变成一个因素。

dat <- data.frame(v1 = rnorm(10),
                  v2 = factor(sample(c(0,1), 10, replace=TRUE)),
                  v3 = rnorm(10),
                  v4 = rnorm(10) )

Part OneNow, for your first part, it looks like this is what you want:

第一部分现在,对于您的第一部分,看起来这就是您想要的:

lm(v1 ~ v2 + v3 + v4, data=dat)

Here's a simpler way to do that, though you still have to specify the response variable.

这是一种更简单的方法,尽管您仍然必须指定响应变量。

lm(v1 ~ ., data=dat)

Alternatively, you certainly can build up the function with paste and call lmon it.

或者,您当然可以使用 paste 构建函数并调用lm它。

f <- paste(names(dat)[1], "~", paste(names(dat)[-1], collapse=" + "))
# "v1 ~ v2 + v3 + v4"
lm(f, data=dat)

However, my preference in these situations is to use do.call, which evaluates expressions before passing them to the function; this makes the resulting object more suitable for calling functions like updateon. Compare the callpart of the output.

但是,在这些情况下,我更喜欢使用do.call,它在将表达式传递给函数之前评估表达式;这使得生成的对象更适合调用像updateon这样的函数。比较call输出的部分。

do.call("lm", list(as.formula(f), data=as.name("dat")))

Part TwoAbout your second part, it looks like this is what you're going for:

第二部分关于你的第二个部分,它看起来这是你要的内容:

lm(factor(v2) + v3 + v4 + v2*v3 + v2*v4, data=dat)

First, because v2is a factor in the data frame, we don't need that part, and secondly, this can be simplified further by better using R's methods for using arithmetical operations to create interactions, like this.

首先,因为v2是数据框中的一个因素,我们不需要那部分,其次,可以通过更好地使用 R 的方法来使用算术运算来创建交互,从而进一步简化这一点,就像这样。

lm(v1 ~ v2*(v3 + v4), data=dat)

I'd then simply create the function using paste; the loop with assign, even in the larger case, is probably not a good idea.

然后我会简单地使用创建函数pasteassign即使在更大的情况下,使用 循环也可能不是一个好主意。

f <- paste(names(dat)[1], "~", names(dat)[2], "* (", 
           paste(names(dat)[-c(1:2)], collapse=" + "), ")")
# "v1 ~ v2 * ( v3 + v4 )"

It can then be called using either lmdirectly or with do.call.

然后可以lm直接使用或使用调用它do.call

lm(f, data=dat)
do.call("lm", list(as.formula(f), data=as.name("dat")))

About your codeThe problem you had with trying to use r3etc was that you wanted the contents of the variable r3, not the value r3. To get the value, you need get, like this, and then you'd collapse the values together with paste.

关于您的代码您在尝试使用r3etc 时遇到的问题是您想要变量的内容r3,而不是值r3。要获得该值,您需要get像这样,然后将这些值与paste.

vars <- sapply(paste0("r", 3:6), get)
paste(vars, collapse=" + ")

However, a better way would be to avoid assignand just build a vector of the terms you want, like this.

但是,更好的方法是避免assign并仅构建您想要的术语的向量,就像这样。

vars <- NULL
for (v in 3:4) {
  vars <- c(vars, colnames(dat)[v], paste(colnames(dat)[2], 
                                          colnames(dat)[v], sep="*"))
}
paste(vars, collapse=" + ")

A more R-like solution would be to use lapply:

更像 R 的解决方案是使用lapply

vars <- unlist(lapply(colnames(dat)[3:4], 
                      function(x) c(x, paste(colnames(dat)[2], x, sep="*"))))

回答by Travis Heeter

TL;DR: use paste.

TL;DR:使用paste.

create_ctree <- function(col){
    myFormula <- paste(col, "~.", collapse="")
    ctree(myFormula, data)
}
create_ctree("class")