Python 搜索并替换为“仅整个单词”选项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17730788/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Search and replace with "whole word only" option
提问by Renan Cidale
I have a script that runs into my text and search and replace all the sentences I write based in a database.
我有一个脚本,可以运行到我的文本中并搜索和替换我基于数据库编写的所有句子。
The script:
剧本:
with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
for l in f:
s = l.split('*')
editor.replace(s[0],s[1])
And the Database example:
和数据库示例:
Event*Evento*
result*resultado*
And so on...
等等...
Now what is happening is that I need the "whole word only" in that script, because I'm finding myself with problems.
现在发生的事情是我需要在该脚本中使用“仅整个单词”,因为我发现自己遇到了问题。
For example with Result
and Event
, because when I replace for Resultado
and Evento
, and I run the script one more time in the text the script replace again the Resultado
and Evento
.
例如使用Result
and Event
,因为当我替换为Resultado
andEvento
并且我在文本中再次运行脚本时,脚本再次替换了Resultado
and Evento
。
And the result after I run the script stays like this Resultadoado
and Eventoo
.
我运行脚本后的结果保持这样Resultadoado
和Eventoo
.
Just so you guys know.. Its not only for Event and Result, there is more then 1000+ sentences that I already set for the search and replace to work..
只是让你们知道..它不仅用于事件和结果,我已经为搜索和替换设置了超过 1000 多个句子。
I don't need a simples search and replace for two words.. because I'm going to be editing the database over and over for different sentences..
我不需要简单的搜索和替换两个词..因为我要为不同的句子一遍又一遍地编辑数据库..
回答by DhruvPathak
Use re.sub
instead of normal string replace to replace only whole words.So your script,even if it runs again will not replace the already replaced words.
使用re.sub
代替普通的字符串替换来替换整个单词。所以你的脚本,即使它再次运行也不会替换已经替换的单词。
>>> import re
>>> editor = "This is result of the match"
>>> new_editor = re.sub(r"\bresult\b","resultado",editor)
>>> new_editor
'This is resultado of the match'
>>> newest_editor = re.sub(r"\bresult\b","resultado",new_editor)
>>> newest_editor
'This is resultado of the match'
回答by kindall
You want a regular expression. You can use the token \b
to match a word boundary: i.e., \bresult\b
would match only the exact word "result."
你想要一个正则表达式。您可以使用标记\b
来匹配单词边界:即,\bresult\b
只匹配精确的单词“result”。
import re
with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
for l in f:
s = l.split('*')
editor = re.sub(r"\b%s\b" % s[0] , s[1], editor)
回答by Steven Rumbalski
Use re.sub
:
使用re.sub
:
replacements = {'the':'a',
'this':'that'}
def replace(match):
return replacements[match.group(0)]
# notice that the 'this' in 'thistle' is not matched
print re.sub('|'.join(r'\b%s\b' % re.escape(s) for s in replacements),
replace, 'the cat has this thistle.')
Prints
印刷
a cat has that thistle.
Notes:
笔记:
All the strings to be replaced are joined into a single pattern so that the string needs to be looped over just once.
The source strings are passed to
re.escape
to make avoid interpreting them as regular expressions.The words are surrounded by
r'\b'
to make sure matches are for whole words only.A replacement function is used so that any match can be replaced.
所有要替换的字符串都连接到一个模式中,这样字符串只需要循环一次。
传递源字符串
re.escape
以避免将它们解释为正则表达式。单词被包围
r'\b'
以确保匹配仅适用于整个单词。使用替换功能以便可以替换任何匹配。
回答by Sudharsan
It is very simple. use re.sub, don't use replace.
这很简单。使用 re.sub,不要使用替换。
import re
replacements = {r'\bthe\b':'a',
r'\bthis\b':'that'}
def replace_all(text, dic):
for i, j in dic.iteritems():
text = re.sub(i,j,text)
return text
replace_all("the cat has this thistle.", replacements)
It will print
它会打印
a cat has that thistle.
回答by Chris Zhu
import re
match = {} # create a dictionary of words-to-replace and words-to-replace-with
f = open("filename", "r")
data = f.read() # string of all file content
def replace_all(text, dic):
for i, j in dic.items():
text = re.sub(r"\b%s\b" % i, j, text)
# r"\b%s\b"% enables replacing by whole word matches only
return text
data = replace_all(data, match)
print(data) # you can copy and paste the result to whatever file you like