Python 搜索并替换为“仅整个单词”选项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17730788/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:01:46  来源:igfitidea点击:

Search and replace with "whole word only" option

python

提问by Renan Cidale

I have a script that runs into my text and search and replace all the sentences I write based in a database.

我有一个脚本,可以运行到我的文本中并搜索和替换我基于数据库编写的所有句子。

The script:

剧本:

with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
    for l in f:
        s = l.split('*')
        editor.replace(s[0],s[1])

And the Database example:

和数据库示例:

Event*Evento*
result*resultado*

And so on...

等等...

Now what is happening is that I need the "whole word only" in that script, because I'm finding myself with problems.

现在发生的事情是我需要在该脚本中使用“仅整个单词”,因为我发现自己遇到了问题。

For example with Resultand Event, because when I replace for Resultadoand Evento, and I run the script one more time in the text the script replace again the Resultadoand Evento.

例如使用Resultand Event,因为当我替换为ResultadoandEvento并且我在文本中再次运行脚本时,脚本再次替换了Resultadoand Evento

And the result after I run the script stays like this Resultadoadoand Eventoo.

我运行脚本后的结果保持这样ResultadoadoEventoo.

Just so you guys know.. Its not only for Event and Result, there is more then 1000+ sentences that I already set for the search and replace to work..

只是让你们知道..它不仅用于事件和结果,我已经为搜索和替换设置了超过 1000 多个句子。

I don't need a simples search and replace for two words.. because I'm going to be editing the database over and over for different sentences..

我不需要简单的搜索和替换两个词..因为我要为不同的句子一遍又一遍地编辑数据库..

回答by DhruvPathak

Use re.subinstead of normal string replace to replace only whole words.So your script,even if it runs again will not replace the already replaced words.

使用re.sub代替普通的字符串替换来替换整个单词。所以你的脚本,即使它再次运行也不会替换已经替换的单词。

>>> import re
>>> editor = "This is result of the match"
>>> new_editor = re.sub(r"\bresult\b","resultado",editor)
>>> new_editor
'This is resultado of the match'
>>> newest_editor = re.sub(r"\bresult\b","resultado",new_editor)
>>> newest_editor
'This is resultado of the match'

回答by kindall

You want a regular expression. You can use the token \bto match a word boundary: i.e., \bresult\bwould match only the exact word "result."

你想要一个正则表达式。您可以使用标记\b来匹配单词边界:即,\bresult\b只匹配精确的单词“result”。

import re

with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
    for l in f:
        s = l.split('*')
        editor = re.sub(r"\b%s\b" % s[0] , s[1], editor)

回答by Steven Rumbalski

Use re.sub:

使用re.sub

replacements = {'the':'a', 
                'this':'that'}

def replace(match):
    return replacements[match.group(0)]

# notice that the 'this' in 'thistle' is not matched 
print re.sub('|'.join(r'\b%s\b' % re.escape(s) for s in replacements), 
        replace, 'the cat has this thistle.') 

Prints

印刷

a cat has that thistle.

Notes:

笔记:

  • All the strings to be replaced are joined into a single pattern so that the string needs to be looped over just once.

  • The source strings are passed to re.escapeto make avoid interpreting them as regular expressions.

  • The words are surrounded by r'\b'to make sure matches are for whole words only.

  • A replacement function is used so that any match can be replaced.

  • 所有要替换的字符串都连接到一个模式中,这样字符串只需要循环一次。

  • 传递源字符串re.escape以避免将它们解释为正则表达式。

  • 单词被包围r'\b'以确保匹配仅适用于整个单词。

  • 使用替换功能以便可以替换任何匹配。

回答by Sudharsan

It is very simple. use re.sub, don't use replace.

这很简单。使用 re.sub,不要使用替换。

import re
replacements = {r'\bthe\b':'a', 
                r'\bthis\b':'that'}

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = re.sub(i,j,text)
    return text

replace_all("the cat has this thistle.", replacements)

It will print

它会打印

a cat has that thistle.

回答by Chris Zhu

import re

match = {}  # create a dictionary of words-to-replace and words-to-replace-with

f = open("filename", "r")
data = f.read()  # string of all file content


def replace_all(text, dic):
    for i, j in dic.items():
        text = re.sub(r"\b%s\b" % i, j, text)
        # r"\b%s\b"% enables replacing by whole word matches only
    return text


data = replace_all(data, match)
print(data)  # you can copy and paste the result to whatever file you like