string 如何搜索多个字符串并将它们替换为字符串列表中的任何内容

Question

提问by userJT

I have a column in a dataframe like this:

我在数据框中有一列，如下所示：

npt2$name
#  [1] "Andreas Groll, M.D."
#  [2] ""
#  [3] "Pan-Chyr Yang, PHD"
#  [4] "Suh-Fang Jeng, Sc.D"
#  [5] "Mostafa K Mohamed Fontanet Arnaud"
#  [6] "Thomas Jozefiak, M.D."
#  [7] "Medical Monitor"
#  [8] "Qi Zhu, MD"
#  [9] "Holly Posner"
# [10] "Peter S Sebel, MB BS, PhD Chantal Kerssens, PhD"
# [11] "Lance A Mynderse, M.D."
# [12] "Lawrence Currie, MD"

I tried gsubbut with no luck. After doing toupper(x)I need to replace all instances of 'MD' or 'M.D.' or 'PHD' with nothing.

我试过了，gsub但没有运气。做完之后，toupper(x)我需要什么都不替换“MD”或“MD”或“PHD”的所有实例。

Is there a nice short trick to do it?

有没有一个很好的简短技巧来做到这一点？

In fact I would be interested to see it done on a single string and how differently it is done in one command on the whole list.

事实上，我很想看到它在单个字符串上完成，以及它在整个列表中的一个命令中完成的有何不同。

Answer 1

回答by IRTFM

Either of these:

其中之一：

gsub("MD|M\.D\.|PHD", "", test)  # target specific strings
gsub("\,.+$", "", test)        # target all characters after comma

Both Matt Parker above and Tommy below have raised the question whether 'M.R.C.P.', 'PhD', 'D.Phil.' and 'Ph.D.' or other British or Continental designations of doctorate level degrees should be sought out and removed. Perhaps @user56 can advise what the intent was.

上面的 Matt Parker 和下面的 Tommy 都提出了“MRCP”、“PhD”、“D.Phil”的问题。和“博士” 或其他英国或大陆指定的博士学位级别学位应该被找出并删除。也许@user56 可以告知意图是什么。

Answer 2

回答by Justin

With a single ugly regex:

使用一个丑陋的正则表达式：

 gsub('[M,P].?D.?','',npt2$name)

Which says, find characters M or P followed by zero or one character of any kind, followed by a D and zero or one additional character. More explicitly, you could do this in three steps:

也就是说，查找字符 M 或 P 后跟零或一个任何类型的字符，然后是 D 和零或一个附加字符。更明确地说，您可以分三步执行此操作：

npt2$name <- gsub('MD','',npt2$name)
npt2$name <- gsub('M\.D\.','',npt2$name)
npt2$name <- gsub('PhD','',npt2name)

In those three, what's happening should be more straight forward. the second replacement you need to "escape" the period since its a special character.

在这三个中，发生的事情应该更直接。第二个替换您需要“转义”该句点，因为它是一个特殊字符。

Answer 3

回答by Tommy

Here's a variant that removes the extra ", " too. Does not require toupppereither - but if you want that, just specify ignore.case=TRUEto gsub.

这里有一个变体，它也删除了额外的“,”。也不需要touppper- 但如果你想要，只需指定ignore.case=TRUEto gsub。

test <- c("Andreas Groll, M.D.", 
  "",
  "Pan-Chyr Yang, PHD",
  "Suh-Fang Jeng, Sc.D",
  "Peter S Sebel, MB BS, PhD Chantal Kerssens, PhD",
  "Lawrence Currie, MD")

gsub(",? *(MD|M\.D\.|P[hH]D)", "", test)
#[1] "Andreas Groll"                         ""                                     
#[3] "Pan-Chyr Yang"                         "Suh-Fang Jeng, Sc.D"                  
#[5] "Peter S Sebel, MB BS Chantal Kerssens" "Lawrence Currie"

string 如何搜索多个字符串并将它们替换为字符串列表中的任何内容

提问by userJT

回答by IRTFM

回答by Justin

回答by Tommy

相关推荐

最近更新

标签

string 如何搜索多个字符串并将它们替换为字符串列表中的任何内容

提问by userJT

回答by IRTFM

回答by Justin

回答by Tommy

相关推荐

string 在 MIPS 程序集中反转字符串

string 如何使用VB6将文本文件加载到字符串中

string 用另一个 NSAttributedString 替换 NSAttributedString 的子字符串

string 如何获取给定字符串的子字符串，直到第一次出现指定字符？

相关推荐

最近更新

标签