在Python中的字符串中查找多个出现的字符串

Question

提问by user225312

How do I find multiple occurrences of a string within a string in Python? Consider this:

如何在 Python 中的字符串中找到多次出现的字符串？考虑一下：

>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>>

So the first occurrence of llis at 1 as expected. How do I find the next occurrence of it?

所以第一次出现ll是在 1 正如预期的那样。我如何找到它的下一次出现？

Same question is valid for a list. Consider:

同样的问题对列表有效。考虑：

>>> x = ['ll', 'ok', 'll']

How do I find all the llwith their indexes?

我如何找到所有的ll索引？

Answer 1

采纳答案by poke

Using regular expressions, you can use re.finditerto find all (non-overlapping) occurences:

使用正则表达式，您可以使用re.finditer查找所有（非重叠）出现：

>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
         print('ll found', m.start(), m.end())

ll found 1 3
ll found 10 12
ll found 16 18

Alternatively, if you don't want the overhead of regular expressions, you can also repeatedly use str.findto get the nextindex:

或者，如果你不想正则表达式的开销，你也可以重复使用str.find来获取下一个索引：

>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
        index = text.find('ll', index)
        if index == -1:
            break
        print('ll found at', index)
        index += 2 # +2 because len('ll') == 2

ll found at  1
ll found at  10
ll found at  16

This also works for lists and other sequences.

这也适用于列表和其他序列。

Answer 2

回答by inspectorG4dget

I think what you are looking for is string.count

我想你正在寻找的是 string.count

"Allowed Hello Hollow".count('ll')
>>> 3

Hope this helps
NOTE: this only captures non-overlapping occurences

希望这会有所帮助
注意：这只捕获非重叠的发生

Answer 3

回答by ghostdog74

>>> for n,c in enumerate(text):
...   try:
...     if c+text[n+1] == "ll": print n
...   except: pass
...
1
10
16

Answer 4

回答by chauncey

For your list example:

对于您的列表示例：

In [1]: x = ['ll','ok','ll']

In [2]: for idx, value in enumerate(x):
   ...:     if value == 'll':
   ...:         print idx, value       
0 ll
2 ll

If you wanted all the items in a list that contained 'll', you could also do that.

如果您想要包含 'll' 的列表中的所有项目，您也可以这样做。

In [3]: x = ['Allowed','Hello','World','Hollow']

In [4]: for idx, value in enumerate(x):
   ...:     if 'll' in value:
   ...:         print idx, value
   ...:         
   ...:         
0 Allowed
1 Hello
3 Hollow

Answer 5

回答by bstpierre

For the list example, use a comprehension:

对于列表示例，请使用理解：

>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]

Similarly for strings:

同样对于字符串：

>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]

this will list adjacent runs of "ll', which may or may not be what you want:

这将列出“ll”的相邻运行，这可能是您想要的，也可能不是：

>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]

Answer 6

回答by intuited

FWIW, here are a couple of non-RE alternatives that I think are neater than poke's solution.

FWIW，这里有几个非 RE 替代方案，我认为它们比poke 的解决方案更简洁。

The first uses str.indexand checks for ValueError:

第一次使用str.index并检查ValueError：

def findall(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall('ll', text))
    (1, 10, 16)
    """
    index = 0 - len(sub)
    try:
        while True:
            index = string.index(sub, index + len(sub))
            yield index
    except ValueError:
        pass

The second tests uses str.findand checks for the sentinel of -1by using iter:

第二测试用途str.find和检查的前哨-1通过使用iter：

def findall_iter(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall_iter('ll', text))
    (1, 10, 16)
    """
    def next_index(length):
        index = 0 - length
        while True:
            index = string.find(sub, index + length)
            yield index
    return iter(next_index(len(sub)).next, -1)

To apply any of these functions to a list, tuple or other iterableof strings, you can use a higher-level function—one that takes a function as one of its arguments— like this one:

要将这些函数中的任何一个应用于列表、元组或其他可迭代的字符串，您可以使用更高级别的函数- 一个将函数作为其参数之一的函数- 如下所示：

def findall_each(findall, sub, strings):
    """
    >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
    >>> list(findall_each(findall, 'll', texts))
    [(), (2, 10), (2,), (2,), ()]
    >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
    >>> list(findall_each(findall_iter, 'll', texts))
    [(4, 7), (1, 6), (2, 7), (2, 6)]
    """
    return (tuple(findall(sub, string)) for string in strings)

Answer 7

回答by Aaron Semeniuk

Brand new to programming in general and working through an online tutorial. I was asked to do this as well, but only using the methods I had learned so far (basically strings and loops). Not sure if this adds any value here, and I know this isn't how you would do it, but I got it to work with this:

全新的编程和在线教程。我也被要求这样做，但只使用我迄今为止学到的方法（基本上是字符串和循环）。不确定这是否在这里增加了任何价值，我知道这不是你会怎么做，但我让它与这个一起工作：

needle = input()
haystack = input()
counter = 0
n=-1
for i in range (n+1,len(haystack)+1):
   for j in range(n+1,len(haystack)+1):
      n=-1
      if needle != haystack[i:j]:
         n = n+1
         continue
      if needle == haystack[i:j]:
         counter = counter + 1
print (counter)

Answer 8

回答by beardc

This version should be linear in length of the string, and should be fine as long as the sequences aren't too repetitive (in which case you can replace the recursion with a while loop).

这个版本在字符串的长度上应该是线性的，只要序列不是太重复就应该没问题（在这种情况下，您可以用 while 循环替换递归）。

def find_all(st, substr, start_pos=0, accum=[]):
    ix = st.find(substr, start_pos)
    if ix == -1:
        return accum
    return find_all(st, substr, start_pos=ix + 1, accum=accum + [ix])

bstpierre's list comprehension is a good solution for short sequences, but looks to have quadratic complexity and never finished on a long text I was using.

bstpierre 的列表推导是短序列的一个很好的解决方案，但看起来具有二次复杂性并且从未完成我使用的长文本。

findall_lc = lambda txt, substr: [n for n in xrange(len(txt))
                                   if txt.find(substr, n) == n]

For a random string of non-trivial length, the two functions give the same result:

对于非平凡长度的随机字符串，这两个函数给出相同的结果：

import random, string; random.seed(0)
s = ''.join([random.choice(string.ascii_lowercase) for _ in range(100000)])

>>> find_all(s, 'th') == findall_lc(s, 'th')
True
>>> findall_lc(s, 'th')[:4]
[564, 818, 1872, 2470]

But the quadratic version is about 300 times slower

但是二次版本慢了大约 300 倍

%timeit find_all(s, 'th')
1000 loops, best of 3: 282 μs per loop

%timeit findall_lc(s, 'th')    
10 loops, best of 3: 92.3 ms per loop

Answer 9

回答by pmsh.93

#!/usr/local/bin python3
#-*- coding: utf-8 -*-

main_string = input()
sub_string = input()

count = counter = 0

for i in range(len(main_string)):
    if main_string[i] == sub_string[0]:
        k = i + 1
        for j in range(1, len(sub_string)):
            if k != len(main_string) and main_string[k] == sub_string[j]:
                count += 1
                k += 1
        if count == (len(sub_string) - 1):
            counter += 1
        count = 0

print(counter)

This program counts the number of all substrings even if they are overlapped without the use of regex. But this is a naive implementation and for better results in worst case it is advised to go through either Suffix Tree, KMP and other string matching data structures and algorithms.

该程序计算所有子字符串的数量，即使它们在不使用正则表达式的情况下重叠。但这是一个幼稚的实现，为了在最坏的情况下获得更好的结果，建议通过后缀树、KMP 和其他字符串匹配数据结构和算法。

Answer 10

回答by Elias Zamaria

Here is my function for finding multiple occurrences. Unlike the other solutions here, it supports the optional start and end parameters for slicing, just like str.index:

这是我查找多次出现的函数。与这里的其他解决方案不同，它支持切片的可选开始和结束参数，就像str.index：

def all_substring_indexes(string, substring, start=0, end=None):
    result = []
    new_start = start
    while True:
        try:
            index = string.index(substring, new_start, end)
        except ValueError:
            return result
        else:
            result.append(index)
            new_start = index + len(substring)

在Python中的字符串中查找多个出现的字符串

提问by user225312

采纳答案by poke

回答by inspectorG4dget

回答by ghostdog74

回答by chauncey

回答by bstpierre

回答by intuited

回答by Aaron Semeniuk

回答by beardc

回答by pmsh.93

回答by Elias Zamaria

相关推荐

最近更新

标签

在Python中的字符串中查找多个出现的字符串

提问by user225312

采纳答案by poke

回答by inspectorG4dget

回答by ghostdog74

回答by chauncey

回答by bstpierre

回答by intuited

回答by Aaron Semeniuk

回答by beardc

回答by pmsh.93

回答by Elias Zamaria

相关推荐

Python itertools中的chain和chain.from_iterable有什么区别？

Python argparse：如何在帮助文本中插入换行符？

Python 如何使用 openpyxl 将列表写入 xlsx

Python 如何将负数转换为正数？

相关推荐

最近更新

标签