Python 如何查找所有出现的子字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4664850/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find all occurrences of a substring?
提问by nukl
Python has string.find()and string.rfind()to get the index of a substring in a string.
Python 具有string.find()和string.rfind()获取字符串中子字符串的索引。
I'm wondering whether there is something like string.find_all()which can return all found indexes (not only the first from the beginning or the first from the end).
我想知道是否有类似的东西string.find_all()可以返回所有找到的索引(不仅是开头的第一个或结尾的第一个)。
For example:
例如:
string = "test test test test"
print string.find('test') # 0
print string.rfind('test') # 15
#this is the goal
print string.find_all('test') # [0,5,10,15]
采纳答案by moinudin
There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:
没有简单的内置字符串函数可以满足您的需求,但您可以使用更强大的正则表达式:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookaheadwill do that:
如果你想找到重叠的匹配项,lookahead会这样做:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:
如果你想要一个没有重叠的反向查找,你可以将正面和负面的前瞻组合成这样的表达式:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditerreturns a generator, so you could change the []in the above to ()to get a generator instead of a list which will be more efficient if you're only iterating through the results once.
re.finditer返回一个generator,所以你可以改变[]上面的 to()来得到一个生成器而不是一个列表,如果你只迭代一次结果,这将更有效。
回答by thkala
Here's a (very inefficient) way to get all(i.e. even overlapping) matches:
这是获取所有(即甚至重叠)匹配项的(非常低效)方法:
>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]
回答by Chinmay Kanchi
You can use re.finditer()for non-overlapping matches.
您可以re.finditer()用于非重叠匹配。
>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]
but won'twork for:
但不适用于:
In [1]: aString="ababa"
In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]
回答by Karl Knechtel
>>> help(str.find)
Help on method_descriptor:
find(...)
S.find(sub [,start [,end]]) -> int
Thus, we can build it ourselves:
因此,我们可以自己构建它:
def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]
No temporary strings or regexes required.
不需要临时字符串或正则表达式。
回答by Cody Piersall
Come, let us recurse together.
来,我们一起复盘。
def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""
substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found
return recurse([], 0)
print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]
No need for regular expressions this way.
这种方式不需要正则表达式。
回答by Andrew H
This thread is a little old but this worked for me:
这个线程有点旧,但这对我有用:
numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"
marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)
回答by jstaab
If you're just looking for a single character, this would work:
如果您只是在寻找一个字符,这将起作用:
string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7
Also,
还,
string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4
My hunch is that neither of these (especially #2) is terribly performant.
我的预感是这些(尤其是#2)的性能都不是很好。
回答by Thurines
this is an old thread but i got interested and wanted to share my solution.
这是一个旧线程,但我很感兴趣并想分享我的解决方案。
def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result
It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.
它应该返回找到子字符串的位置列表。如果您看到错误或改进空间,请发表评论。
回答by AkiRoss
Again, old thread, but here's my solution using a generatorand plain str.find.
同样,旧线程,但这是我使用生成器和普通str.find.
def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)
Example
例子
x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]
returns
回报
[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]
回答by u7713015
please look at below code
请看下面的代码
#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''
def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result
if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)

