在Python中的字符串中查找多个出现的字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3873361/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding multiple occurrences of a string within a string in Python
提问by user225312
How do I find multiple occurrences of a string within a string in Python? Consider this:
如何在 Python 中的字符串中找到多次出现的字符串?考虑一下:
>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>>
So the first occurrence of llis at 1 as expected. How do I find the next occurrence of it?
所以第一次出现ll是在 1 正如预期的那样。我如何找到它的下一次出现?
Same question is valid for a list. Consider:
同样的问题对列表有效。考虑:
>>> x = ['ll', 'ok', 'll']
How do I find all the llwith their indexes?
我如何找到所有的ll索引?
采纳答案by poke
Using regular expressions, you can use re.finditerto find all (non-overlapping) occurences:
使用正则表达式,您可以使用re.finditer查找所有(非重叠)出现:
>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
print('ll found', m.start(), m.end())
ll found 1 3
ll found 10 12
ll found 16 18
Alternatively, if you don't want the overhead of regular expressions, you can also repeatedly use str.findto get the nextindex:
或者,如果你不想正则表达式的开销,你也可以重复使用str.find来获取下一个索引:
>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
index = text.find('ll', index)
if index == -1:
break
print('ll found at', index)
index += 2 # +2 because len('ll') == 2
ll found at 1
ll found at 10
ll found at 16
This also works for lists and other sequences.
这也适用于列表和其他序列。
回答by inspectorG4dget
I think what you are looking for is string.count
我想你正在寻找的是 string.count
"Allowed Hello Hollow".count('ll')
>>> 3
Hope this helps
NOTE: this only captures non-overlapping occurences
希望这会有所帮助
注意:这只捕获非重叠的发生
回答by ghostdog74
>>> for n,c in enumerate(text):
... try:
... if c+text[n+1] == "ll": print n
... except: pass
...
1
10
16
回答by chauncey
For your list example:
对于您的列表示例:
In [1]: x = ['ll','ok','ll']
In [2]: for idx, value in enumerate(x):
...: if value == 'll':
...: print idx, value
0 ll
2 ll
If you wanted all the items in a list that contained 'll', you could also do that.
如果您想要包含 'll' 的列表中的所有项目,您也可以这样做。
In [3]: x = ['Allowed','Hello','World','Hollow']
In [4]: for idx, value in enumerate(x):
...: if 'll' in value:
...: print idx, value
...:
...:
0 Allowed
1 Hello
3 Hollow
回答by bstpierre
For the list example, use a comprehension:
对于列表示例,请使用理解:
>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]
Similarly for strings:
同样对于字符串:
>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]
this will list adjacent runs of "ll', which may or may not be what you want:
这将列出“ll”的相邻运行,这可能是您想要的,也可能不是:
>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]
回答by intuited
FWIW, here are a couple of non-RE alternatives that I think are neater than poke's solution.
FWIW,这里有几个非 RE 替代方案,我认为它们比poke 的解决方案更简洁。
The first uses str.indexand checks for ValueError:
第一次使用str.index并检查ValueError:
def findall(sub, string):
"""
>>> text = "Allowed Hello Hollow"
>>> tuple(findall('ll', text))
(1, 10, 16)
"""
index = 0 - len(sub)
try:
while True:
index = string.index(sub, index + len(sub))
yield index
except ValueError:
pass
The second tests uses str.findand checks for the sentinel of -1by using iter:
第二测试用途str.find和检查的前哨-1通过使用iter:
def findall_iter(sub, string):
"""
>>> text = "Allowed Hello Hollow"
>>> tuple(findall_iter('ll', text))
(1, 10, 16)
"""
def next_index(length):
index = 0 - length
while True:
index = string.find(sub, index + length)
yield index
return iter(next_index(len(sub)).next, -1)
To apply any of these functions to a list, tuple or other iterableof strings, you can use a higher-level function—one that takes a function as one of its arguments— like this one:
要将这些函数中的任何一个应用于列表、元组或其他可迭代的字符串,您可以使用更高级别的函数- 一个将函数作为其参数之一的函数- 如下所示:
def findall_each(findall, sub, strings):
"""
>>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
>>> list(findall_each(findall, 'll', texts))
[(), (2, 10), (2,), (2,), ()]
>>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
>>> list(findall_each(findall_iter, 'll', texts))
[(4, 7), (1, 6), (2, 7), (2, 6)]
"""
return (tuple(findall(sub, string)) for string in strings)
回答by Aaron Semeniuk
Brand new to programming in general and working through an online tutorial. I was asked to do this as well, but only using the methods I had learned so far (basically strings and loops). Not sure if this adds any value here, and I know this isn't how you would do it, but I got it to work with this:
全新的编程和在线教程。我也被要求这样做,但只使用我迄今为止学到的方法(基本上是字符串和循环)。不确定这是否在这里增加了任何价值,我知道这不是你会怎么做,但我让它与这个一起工作:
needle = input()
haystack = input()
counter = 0
n=-1
for i in range (n+1,len(haystack)+1):
for j in range(n+1,len(haystack)+1):
n=-1
if needle != haystack[i:j]:
n = n+1
continue
if needle == haystack[i:j]:
counter = counter + 1
print (counter)
回答by beardc
This version should be linear in length of the string, and should be fine as long as the sequences aren't too repetitive (in which case you can replace the recursion with a while loop).
这个版本在字符串的长度上应该是线性的,只要序列不是太重复就应该没问题(在这种情况下,您可以用 while 循环替换递归)。
def find_all(st, substr, start_pos=0, accum=[]):
ix = st.find(substr, start_pos)
if ix == -1:
return accum
return find_all(st, substr, start_pos=ix + 1, accum=accum + [ix])
bstpierre's list comprehension is a good solution for short sequences, but looks to have quadratic complexity and never finished on a long text I was using.
bstpierre 的列表推导是短序列的一个很好的解决方案,但看起来具有二次复杂性并且从未完成我使用的长文本。
findall_lc = lambda txt, substr: [n for n in xrange(len(txt))
if txt.find(substr, n) == n]
For a random string of non-trivial length, the two functions give the same result:
对于非平凡长度的随机字符串,这两个函数给出相同的结果:
import random, string; random.seed(0)
s = ''.join([random.choice(string.ascii_lowercase) for _ in range(100000)])
>>> find_all(s, 'th') == findall_lc(s, 'th')
True
>>> findall_lc(s, 'th')[:4]
[564, 818, 1872, 2470]
But the quadratic version is about 300 times slower
但是二次版本慢了大约 300 倍
%timeit find_all(s, 'th')
1000 loops, best of 3: 282 μs per loop
%timeit findall_lc(s, 'th')
10 loops, best of 3: 92.3 ms per loop
回答by pmsh.93
#!/usr/local/bin python3
#-*- coding: utf-8 -*-
main_string = input()
sub_string = input()
count = counter = 0
for i in range(len(main_string)):
if main_string[i] == sub_string[0]:
k = i + 1
for j in range(1, len(sub_string)):
if k != len(main_string) and main_string[k] == sub_string[j]:
count += 1
k += 1
if count == (len(sub_string) - 1):
counter += 1
count = 0
print(counter)
This program counts the number of all substrings even if they are overlapped without the use of regex. But this is a naive implementation and for better results in worst case it is advised to go through either Suffix Tree, KMP and other string matching data structures and algorithms.
该程序计算所有子字符串的数量,即使它们在不使用正则表达式的情况下重叠。但这是一个幼稚的实现,为了在最坏的情况下获得更好的结果,建议通过后缀树、KMP 和其他字符串匹配数据结构和算法。
回答by Elias Zamaria
Here is my function for finding multiple occurrences. Unlike the other solutions here, it supports the optional start and end parameters for slicing, just like str.index:
这是我查找多次出现的函数。与这里的其他解决方案不同,它支持切片的可选开始和结束参数,就像str.index:
def all_substring_indexes(string, substring, start=0, end=None):
result = []
new_start = start
while True:
try:
index = string.index(substring, new_start, end)
except ValueError:
return result
else:
result.append(index)
new_start = index + len(substring)

