string 在 R 中的一个 gsub() 或 chartr() 语句中替换多个字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33949945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 16:27:58  来源:igfitidea点击:

Replace multiple strings in one gsub() or chartr() statement in R?

rstringgsub

提问by Eric Chang

I have a string variable containing alphabet[a-z], space[ ], and apostrophe['],eg. x <- "a'b c"I want to replace apostrophe['] with blank[], and replace space[ ] with underscore[_].

我有一个包含字母 [az]、空格 [] 和撇号 ['] 的字符串变量,例如。x <- "a'b c"我想用空白[]替换撇号['],用下划线[_]替换空格[]。

x <- gsub("'", "", x)
x <- gsub(" ", "_", x)

It works absolutely, but when I have a lot of condition, the code becomes ugly. Therefore, I want to use chartr(), but chartr()can't deal with blank, eg.

它绝对有效,但是当我有很多条件时,代码变得丑陋。因此,我想使用chartr(),但chartr()不能处理空白,例如。

x <- chartr("' ", "_", x) 
#Error in chartr("' ", "_", "a'b c") : 'old' is longer than 'new'

Is there any way to solve this problem? thanks!

有没有办法解决这个问题?谢谢!

回答by Peter

I am a fan of the syntax that the %<>%and %>%opperators from the magrittrpackage provide.

我很喜欢包中的%<>%%>%运算符magrittr提供的语法。

library(magrittr)

x <- "a'b c"

x %<>%
  gsub("'", "", .) %>%
  gsub(" ", "_", .) 
x
##[1] "ab_c"

gusbfnis wonderful, but I like the chaining %>%allows.

gusbfn很棒,但我喜欢链接%>%允许。

回答by Ronak Shah

You can use gsubfn

您可以使用 gsubfn

library(gsubfn)
gsubfn(".", list("'" = "", " " = "_"), x)
# [1] "ab_c"

Similarly, we can also use mgsubwhich allows multiple replacement with multiple pattern to search

同样,我们也可以使用mgsubwhich 允许多个替换多个模式来搜索

mgsub::mgsub(x, c("'", " "), c("", "_"))
#[1] "ab_c"

回答by ismirsehregal

I'd go with the quite fast function stri_replace_all_fixedfrom library(stringi):

我会使用stri_replace_all_fixed库(stringi)中相当快的函数:

library(stringi)    
stri_replace_all_fixed("a'b c", pattern = c("'", " "), replacement = c("", "_"), vectorize_all = FALSE)

Here is a benchmark taking into account most of the other suggested solutions:

这是考虑到大多数其他建议解决方案的基准:

library(stringi)
library(microbenchmark)
library(gsubfn)
library(mgsub)
library(magrittr)
library(dplyr)

x_gsubfn <-
x_mgsub <-
x_nested_gsub <-
x_magrittr <-
x_stringi <- "a'b c"

microbenchmark("gsubfn" = { gsubfn(".", list("'" = "", " " = "_"), x_gsubfn) },
               "mgsub" = { mgsub::mgsub(x_mgsub, c("'", " "), c("", "_")) },
               "nested_gsub" = { gsub("Find", "Replace", gsub("Find","Replace", x_nested_gsub)) },
               "magrittr" = { x_magrittr %<>% gsub("'", "", .) %>% gsub(" ", "_", .) },
               "stringi" = { stri_replace_all_fixed(x_stringi, pattern = c("'", " "), replacement = c("", "_"), vectorize_all = FALSE) }
               )


Unit: microseconds
        expr     min       lq      mean   median       uq     max neval
      gsubfn 458.217 482.3130 519.12820 513.3215 538.0100 715.371   100
       mgsub 180.521 200.8650 221.20423 216.0730 231.6755 460.587   100
 nested_gsub  14.615  15.9980  17.92178  17.7760  18.7630  40.687   100
    magrittr 113.765 133.7125 148.48202 142.9950 153.0680 296.261   100
     stringi   3.950   7.7030   8.41780   8.2960   9.0860  26.071   100

回答by d8aninja

I would opt for a magrittrand/or dplyrsolution, as well. However, I prefer not making a new copy of the object, especially if it is in a function and can be returned cheaply.

我也会选择一个magrittr和/或dplyr解决方案。但是,我不喜欢创建对象的新副本,特别是如果它在函数中并且可以廉价地返回。

i.e.

IE

return(
  catInTheHat %>% gsub('Thing1', 'Thing2', .) %>% gsub('Red Fish', 'Blue 
    Fish', .)
)

...and so on.

...等等。

回答by Atul

I think nested gsub will do the job.

我认为嵌套的 gsub 可以完成这项工作。

gsub("Find","Replace",gsub("Find","Replace",X))

回答by zhan2383

gsub("\s", "", chartr("' ", " _", x)) # Use whitespace and then remove it