Python 在两个子字符串之间查找字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3368969/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:41:19  来源:igfitidea点击:

Find string between two substrings

stringpythonsubstring

提问by John Howard

How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?

如何在两个子字符串 ( '123STRINGabc' -> 'STRING')之间找到一个字符串?

My current method is like this:

我目前的方法是这样的:

>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis

However, this seems very inefficient and un-pythonic. What is a better way to do something like this?

然而,这似乎非常低效且不符合 Python 风格。做这样的事情的更好方法是什么?

Forgot to mention: The string might not start and end with startand end. They may have more characters before and after.

忘了提:字符串可能不会以startand开头和结尾end。它们之前和之后可能有更多字符。

采纳答案by Nikolaus Gradwohl

import re

s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))

回答by Tim McNamara

s[len(start):-len(end)]

回答by josh

My method will be to do something like,

我的方法是做类似的事情,

find index of start string in s => i
find index of end string in s => j

substring = substring(i+len(start) to j-1)

回答by cji

s = "123123STRINGabcabc"

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""

def find_between_r( s, first, last ):
    try:
        start = s.rindex( first ) + len( first )
        end = s.rindex( last, start )
        return s[start:end]
    except ValueError:
        return ""


print find_between( s, "123", "abc" )
print find_between_r( s, "123", "abc" )

gives:

给出:

123STRING
STRINGabc

I thought it should be noted - depending on what behavior you need, you can mix indexand rindexcalls or go with one of the above versions (it's equivalent of regex (.*)and (.*?)groups).

我认为应该注意 - 根据您需要的行为,您可以混合indexrindex调用或使用上述版本之一(它相当于正则表达式(.*)(.*?)组)。

回答by John La Rooy

Here is one way to do it

这是一种方法

_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result

Another way using regexp

使用正则表达式的另一种方式

import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]

or

或者

print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)

回答by Tony Veijalainen

This I posted before as code snippet in Daniweb:

这是我之前在 Daniweb 中作为代码片段发布的

# picking up piece of string between separators
# function using partition, like partition, but drops the separators
def between(left,right,s):
    before,_,a = s.partition(left)
    a,_,after = a.partition(right)
    return before,a,after

s = "bla bla blaa <a>data</a> lsdjfasdj?f (important notice) 'Daniweb forum' tcha tcha tchaa"
print between('<a>','</a>',s)
print between('(',')',s)
print between("'","'",s)

""" Output:
('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')
"""

回答by Tim McNamara

String formatting adds some flexibility to what Nikolaus Gradwohl suggested. startand endcan now be amended as desired.

字符串格式为 Nikolaus Gradwohl 的建议增加了一些灵活性。start并且end根据需要,现在可以修改。

import re

s = 'asdf=5;iwantthis123jasd'
start = 'asdf=5;'
end = '123jasd'

result = re.search('%s(.*)%s' % (start, end), s).group(1)
print(result)

回答by Reinstate Monica - Goodbye SE

To extract STRING, try:

要提取STRING,请尝试:

myString = '123STRINGabc'
startString = '123'
endString = 'abc'

mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]

回答by ansetou

start = 'asdf=5;'
end = '123jasd'
s = 'asdf=5;iwantthis123jasd'
print s[s.find(start)+len(start):s.rfind(end)]

gives

iwantthis

回答by tstoev

source='your token _here0@df and maybe _here1@df or maybe _here2@df'
start_sep='_'
end_sep='@df'
result=[]
tmp=source.split(start_sep)
for par in tmp:
  if end_sep in par:
    result.append(par.split(end_sep)[0])

print result

must show: here0, here1, here2

必须显示:here0、here1、here2

the regex is better but it will require additional lib an you may want to go for python only

正则表达式更好,但它需要额外的库,您可能只想使用 python