Python 如何查找所有出现的子字符串？

Question

提问by nukl

Python has string.find()and string.rfind()to get the index of a substring in a string.

Python 具有string.find()和string.rfind()获取字符串中子字符串的索引。

I'm wondering whether there is something like string.find_all()which can return all found indexes (not only the first from the beginning or the first from the end).

我想知道是否有类似的东西string.find_all()可以返回所有找到的索引（不仅是开头的第一个或结尾的第一个）。

For example:

例如：

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

Answer 1

采纳答案by moinudin

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

没有简单的内置字符串函数可以满足您的需求，但您可以使用更强大的正则表达式：

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookaheadwill do that:

如果你想找到重叠的匹配项，lookahead会这样做：

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

如果你想要一个没有重叠的反向查找，你可以将正面和负面的前瞻组合成这样的表达式：

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditerreturns a generator, so you could change the []in the above to ()to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

re.finditer返回一个generator，所以你可以改变[]上面的 to()来得到一个生成器而不是一个列表，如果你只迭代一次结果，这将更有效。

Answer 2

回答by thkala

Here's a (very inefficient) way to get all(i.e. even overlapping) matches:

这是获取所有（即甚至重叠）匹配项的（非常低效）方法：

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

Answer 3

回答by Chinmay Kanchi

You can use re.finditer()for non-overlapping matches.

您可以re.finditer()用于非重叠匹配。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won'twork for:

但不适用于：

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

Answer 4

回答by Karl Knechtel

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

因此，我们可以自己构建它：

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.

不需要临时字符串或正则表达式。

Answer 5

回答by Cody Piersall

Come, let us recurse together.

来，我们一起复盘。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.

这种方式不需要正则表达式。

Answer 6

回答by Andrew H

This thread is a little old but this worked for me:

这个线程有点旧，但这对我有用：

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

Answer 7

回答by jstaab

If you're just looking for a single character, this would work:

如果您只是在寻找一个字符，这将起作用：

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

还，

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.

我的预感是这些（尤其是#2）的性能都不是很好。

Answer 8

回答by Thurines

this is an old thread but i got interested and wanted to share my solution.

这是一个旧线程，但我很感兴趣并想分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.

它应该返回找到子字符串的位置列表。如果您看到错误或改进空间，请发表评论。

Answer 9

回答by AkiRoss

Again, old thread, but here's my solution using a generatorand plain str.find.

同样，旧线程，但这是我使用生成器和普通str.find.

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

例子

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

回报

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

Answer 10

回答by u7713015

please look at below code

请看下面的代码

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

Python 如何查找所有出现的子字符串？

提问by nukl

采纳答案by moinudin

回答by thkala

回答by Chinmay Kanchi

回答by Karl Knechtel

回答by Cody Piersall

回答by Andrew H

回答by jstaab

回答by Thurines

回答by AkiRoss

Example

例子

回答by u7713015

相关推荐

最近更新

标签

Python 如何查找所有出现的子字符串？

提问by nukl

采纳答案by moinudin

回答by thkala

回答by Chinmay Kanchi

回答by Karl Knechtel

回答by Cody Piersall

回答by Andrew H

回答by jstaab

回答by Thurines

回答by AkiRoss

Example

例子

回答by u7713015

相关推荐

Python 迭代字典，添加键和值

如何在Python中按键对字典进行排序

Python 如何检查此用户是匿名用户还是我系统上的实际用户？

Setup.py：在 CentOS 上使用 Python2.6 安装 lxml

相关推荐

最近更新

标签