Python 在字符串中查找子字符串但仅当整个单词?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4154961/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:30:39  来源:igfitidea点击:

Find substring in string but only if whole words?

pythonsearchstringsubstring

提问by AP257

What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?

在 Python 中的另一个字符串中查找字符串的优雅方法是什么,但前提是子字符串在整个单词中,而不是单词的一部分?

Perhaps an example will demonstrate what I mean:

也许一个例子会证明我的意思:

string1 = "ADDLESHAW GODDARD"
string2 = "ADDLESHAW GODDARD LLP"
assert string_found(string1, string2)  # this is True
string1 = "ADVANCE"
string2 = "ADVANCED BUSINESS EQUIPMENT LTD"
assert not string_found(string1, string2)  # this should be False

How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:

我怎样才能最好地编写一个名为 string_found 的函数来满足我的需要?我想也许我可以用这样的东西来捏造它:

def string_found(string1, string2):
   if string2.find(string1 + " "):
      return True
   return False

But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)

但这感觉不是很优雅,如果它在 string2 的末尾,也不会匹配 string1。也许我需要一个正则表达式?(argh 正则表达式恐惧)

回答by aaronasterling

Here's a way to do it without a regex (as requested) assuming that you want any whitespace to serve as a word separator.

这是一种无需正则表达式(根据要求)的方法,假设您希望任何空格都用作单词分隔符。

import string

def find_substring(needle, haystack):
    index = haystack.find(needle)
    if index == -1:
        return False
    if index != 0 and haystack[index-1] not in string.whitespace:
        return False
    L = index + len(needle)
    if L < len(haystack) and haystack[L] not in string.whitespace:
        return False
    return True

And here's some demo code(codepad is a great idea: Thanks to Felix Kling for reminding me)

这是一些演示代码(键盘是个好主意:感谢 Felix Kling 提醒我)

回答by Felix Kling

You can use regular expressionsand the word boundary special character \b(highlight by me):

您可以使用正则表达式和单词边界特殊字符\b(由我突出显示):

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \bis defined as the boundary between \wand \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODEand LOCALEflags. Inside a character range, \brepresents the backspace character, for compatibility with Python's string literals.

匹配空字符串,但仅在单词的开头或结尾。单词被定义为一系列字母数字或下划线字符,因此单词的结尾由空格或非字母数字、非下划线字符表示。该说明\b被定义为之间的边界\w\W,所以精确的字符集被认为是字母数字取决于的值UNICODELOCALE标志。在字符范围内,\b表示退格字符,以便与 Python 的字符串文字兼容。

def string_found(string1, string2):
   if re.search(r"\b" + re.escape(string1) + r"\b", string2):
      return True
   return False

Demo

演示



If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:

如果单词边界对您来说只是空格,您还可以在字符串前加上空格:

def string_found(string1, string2):
   string1 = " " + string1.strip() + " "
   string2 = " " + string2.strip() + " "
   return string2.find(string1)

回答by Chris Larson

One approach using the re, or regex, module that should accomplish this task is:

使用re应完成此任务的 或正则表达式模块的一种方法是:

import re

string1 = "pizza pony"
string2 = "who knows what a pizza pony is?"

search_result = re.search(r'\b' + string1 + '\W', string2)

print(search_result.group())

回答by Chris Larson

The simplest and most pythonic way, I believe, is to break the strings down into individual words and scan for a match:

我相信,最简单和最 Pythonic 的方法是将字符串分解为单个单词并扫描匹配项:


    string = "My Name Is Josh"
    substring = "Name"

    for word in string.split():
        if substring == word:
            print("Match Found")

For a bonus, here's a oneliner:

对于奖金,这是一个单线:

any([substring == word for word in string.split()])

回答by TCSGrad

I'm building off this answer.

我正在建立这个答案

The problem with the above code is that it will return false when there are multiple occurrences of needlein haystack, with the second occurrence satisfying the search criteria but not the first.

上面代码的问题在于,当多次出现needlein时,它将返回 false haystack,第二次出现满足搜索条件但不满足第一次。

Here's my version:

这是我的版本:

def find_substring(needle, haystack):
  search_start = 0
  while (search_start < len(haystack)):
    index = haystack.find(needle, search_start)
    if index == -1:
      return False
    is_prefix_whitespace = (index == 0 or haystack[index-1] in string.whitespace)
    search_start = index + len(needle)
    is_suffix_whitespace = (search_start == len(haystack) or haystack[search_start] in string.whitespace)
    if (is_prefix_whitespace and is_suffix_whitespace):
      return True
  return False

Hope that helps!

希望有帮助!

回答by SOLOSNAKE231

def string_found(string1,string2):
    if string2 in string1 and string2[string2.index(string1)-1]==" 
    " and string2[string2.index(string1)+len(string1)]==" ":return True
    elif string2.index(string1)+len(string1)==len(string2) and 
    string2[string2.index(string1)-1]==" ":return True
    else:return False

回答by Danilo Castro

Excuse me REGEX fellows, but the simpler answer is:

对不起,正则表达式的家伙,但更简单的答案是:

text = "this is the esquisidiest piece never ever writen"
word = "is"
" {0} ".format(text).lower().count(" {0} ".format(word).lower())

The trick here is to add 2 spaces surrounding the 'text' and the 'word' to be searched, so you guarantee there will be returning only counts for the whole word and you don't get troubles with endings and beginnings of the 'text' searched.

这里的技巧是在要搜索的“文本”和“单词”周围添加 2 个空格,这样您就可以保证只返回整个单词的计数,并且不会遇到“文本”的结尾和开头的问题'搜索。