Python 如何查找所有出现的子字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4664850/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 16:49:37  来源:igfitidea点击:

How to find all occurrences of a substring?

pythonregexstring

提问by nukl

Python has string.find()and string.rfind()to get the index of a substring in a string.

Python 具有string.find()string.rfind()获取字符串中子字符串的索引。

I'm wondering whether there is something like string.find_all()which can return all found indexes (not only the first from the beginning or the first from the end).

我想知道是否有类似的东西string.find_all()可以返回所有找到的索引(不仅是开头的第一个或结尾的第一个)。

For example:

例如:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

采纳答案by moinudin

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

没有简单的内置字符串函数可以满足您的需求,但您可以使用更强大的正则表达式

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookaheadwill do that:

如果你想找到重叠的匹配项,lookahead会这样做:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

如果你想要一个没有重叠的反向查找,你可以将正面和负面的前瞻组合成这样的表达式:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditerreturns a generator, so you could change the []in the above to ()to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

re.finditer返回一个generator,所以你可以改变[]上面的 to()来得到一个生成器而不是一个列表,如果你只迭代一次结果,这将更有效。

回答by thkala

Here's a (very inefficient) way to get all(i.e. even overlapping) matches:

这是获取所有(即甚至重叠)匹配项的(非常低效)方法:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

回答by Chinmay Kanchi

You can use re.finditer()for non-overlapping matches.

您可以re.finditer()用于非重叠匹配。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won'twork for:

不适用于:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

回答by Karl Knechtel

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

因此,我们可以自己构建它:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.

不需要临时字符串或正则表达式。

回答by Cody Piersall

Come, let us recurse together.

来,我们一起复盘。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.

这种方式不需要正则表达式。

回答by Andrew H

This thread is a little old but this worked for me:

这个线程有点旧,但这对我有用:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

回答by jstaab

If you're just looking for a single character, this would work:

如果您只是在寻找一个字符,这将起作用:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

还,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.

我的预感是这些(尤其是#2)的性能都不是很好。

回答by Thurines

this is an old thread but i got interested and wanted to share my solution.

这是一个旧线程,但我很感兴趣并想分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.

它应该返回找到子字符串的位置列表。如果您看到错误或改进空间,请发表评论。

回答by AkiRoss

Again, old thread, but here's my solution using a generatorand plain str.find.

同样,旧线程,但这是我使用生成器和普通str.find.

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

例子

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

回报

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

回答by u7713015

please look at below code

请看下面的代码

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)