string 如何搜索多个字符串并将它们替换为字符串列表中的任何内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9416089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to search for multiple strings and replace them with nothing within a list of strings
提问by userJT
I have a column in a dataframe like this:
我在数据框中有一列,如下所示:
npt2$name
# [1] "Andreas Groll, M.D."
# [2] ""
# [3] "Pan-Chyr Yang, PHD"
# [4] "Suh-Fang Jeng, Sc.D"
# [5] "Mostafa K Mohamed Fontanet Arnaud"
# [6] "Thomas Jozefiak, M.D."
# [7] "Medical Monitor"
# [8] "Qi Zhu, MD"
# [9] "Holly Posner"
# [10] "Peter S Sebel, MB BS, PhD Chantal Kerssens, PhD"
# [11] "Lance A Mynderse, M.D."
# [12] "Lawrence Currie, MD"
I tried gsub
but with no luck.
After doing toupper(x)
I need to replace all instances of 'MD' or 'M.D.' or 'PHD' with nothing.
我试过了,gsub
但没有运气。做完之后,toupper(x)
我需要什么都不替换“MD”或“MD”或“PHD”的所有实例。
Is there a nice short trick to do it?
有没有一个很好的简短技巧来做到这一点?
In fact I would be interested to see it done on a single string and how differently it is done in one command on the whole list.
事实上,我很想看到它在单个字符串上完成,以及它在整个列表中的一个命令中完成的有何不同。
回答by IRTFM
Either of these:
其中之一:
gsub("MD|M\.D\.|PHD", "", test) # target specific strings
gsub("\,.+$", "", test) # target all characters after comma
Both Matt Parker above and Tommy below have raised the question whether 'M.R.C.P.', 'PhD', 'D.Phil.' and 'Ph.D.' or other British or Continental designations of doctorate level degrees should be sought out and removed. Perhaps @user56 can advise what the intent was.
上面的 Matt Parker 和下面的 Tommy 都提出了“MRCP”、“PhD”、“D.Phil”的问题。和“博士” 或其他英国或大陆指定的博士学位级别学位应该被找出并删除。也许@user56 可以告知意图是什么。
回答by Justin
With a single ugly regex:
使用一个丑陋的正则表达式:
gsub('[M,P].?D.?','',npt2$name)
Which says, find characters M or P followed by zero or one character of any kind, followed by a D and zero or one additional character. More explicitly, you could do this in three steps:
也就是说,查找字符 M 或 P 后跟零或一个任何类型的字符,然后是 D 和零或一个附加字符。更明确地说,您可以分三步执行此操作:
npt2$name <- gsub('MD','',npt2$name)
npt2$name <- gsub('M\.D\.','',npt2$name)
npt2$name <- gsub('PhD','',npt2name)
In those three, what's happening should be more straight forward. the second replacement you need to "escape" the period since its a special character.
在这三个中,发生的事情应该更直接。第二个替换您需要“转义”该句点,因为它是一个特殊字符。
回答by Tommy
Here's a variant that removes the extra ", " too. Does not require touppper
either - but if you want that, just specify ignore.case=TRUE
to gsub
.
这里有一个变体,它也删除了额外的“,”。也不需要touppper
- 但如果你想要,只需指定ignore.case=TRUE
to gsub
。
test <- c("Andreas Groll, M.D.",
"",
"Pan-Chyr Yang, PHD",
"Suh-Fang Jeng, Sc.D",
"Peter S Sebel, MB BS, PhD Chantal Kerssens, PhD",
"Lawrence Currie, MD")
gsub(",? *(MD|M\.D\.|P[hH]D)", "", test)
#[1] "Andreas Groll" ""
#[3] "Pan-Chyr Yang" "Suh-Fang Jeng, Sc.D"
#[5] "Peter S Sebel, MB BS Chantal Kerssens" "Lawrence Currie"