Python 如何提取两个标记之间的子字符串？

Question

提问by miernik

Let's say I have a string 'gfgfdAAA1234ZZZuijjk'and I want to extract just the '1234'part.

假设我有一个字符串'gfgfdAAA1234ZZZuijjk'，我只想提取该'1234'部分。

I only know what will be the few characters directly before AAA, and after ZZZthe part I am interested in 1234.

我只知道我感兴趣的部分之前AAA和之后ZZZ的几个字符是什么1234。

With sedit is possible to do something like this with a string:

有了sed它，可以做这样的事情有一个字符串：

echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*||"

And this will give me 1234as a result.

这将给我1234一个结果。

How to do the same thing in Python?

如何在 Python 中做同样的事情？

Answer 1

采纳答案by eumiro

Using regular expressions - documentationfor further reference

使用正则表达式 -进一步参考的文档

import re

text = 'gfgfdAAA1234ZZZuijjk'

m = re.search('AAA(.+?)ZZZ', text)
if m:
    found = m.group(1)

# found: 1234

or:

或者：

import re

text = 'gfgfdAAA1234ZZZuijjk'

try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    # AAA, ZZZ not found in the original string
    found = '' # apply your error handling

# found: 1234

Answer 2

回答by Lennart Regebro

>>> s = 'gfgfdAAA1234ZZZuijjk'
>>> start = s.find('AAA') + 3
>>> end = s.find('ZZZ', start)
>>> s[start:end]
'1234'

Then you can use regexps with the re module as well, if you want, but that's not necessary in your case.

然后，如果需要，您也可以将 regexp 与 re 模块一起使用，但这在您的情况下不是必需的。

Answer 3

回答by infrared

import re
print re.search('AAA(.*?)ZZZ', 'gfgfdAAA1234ZZZuijjk').group(1)

Answer 4

回答by andreypopp

You can use remodule for that:

您可以为此使用re模块：

>>> import re
>>> re.compile(".*AAA(.*)ZZZ.*").match("gfgfdAAA1234ZZZuijjk").groups()
('1234,)

Answer 5

回答by tzot

regular expression

正则表达式

import re

re.search(r"(?<=AAA).*?(?=ZZZ)", your_text).group(0)

The above as-is will fail with an AttributeErrorif there are no "AAA" and "ZZZ" in your_text

AttributeError如果没有“AAA”和“ZZZ”，上面的原样将失败your_text

string methods

字符串方法

your_text.partition("AAA")[2].partition("ZZZ")[0]

The above will return an empty string if either "AAA" or "ZZZ" don't exist in your_text.

如果“AAA”或“ZZZ”不存在于your_text.

PS Python Challenge?

PS Python挑战？

Answer 6

回答by Denis Kutlubaev

Just in case somebody will have to do the same thing that I did. I had to extract everything inside parenthesis in a line. For example, if I have a line like 'US president (Barack Obama) met with ...' and I want to get only 'Barack Obama' this is solution:

以防万一有人不得不做和我一样的事情。我不得不在一行中提取括号内的所有内容。例如，如果我有一条像“美国总统（巴拉克奥巴马）会见……”这样的台词，而我只想得到“巴拉克奥巴马”，这是解决方案：

regex = '.*\((.*?)\).*'
matches = re.search(regex, line)
line = matches.group(1) + '\n'

I.e. you need to block parenthesis with slash \sign. Though it is a problem about more regular expressions that Python.

即你需要用slash \符号来阻止括号。尽管与 Python 相比，更多的正则表达式是一个问题。

Also, in some cases you may see 'r' symbols before regex definition. If there is no r prefix, you need to use escape characters like in C. Hereis more discussion on that.

此外，在某些情况下，您可能会在正则表达式定义之前看到“r”符号。如果没有 r 前缀，则需要像在 C 中一样使用转义字符。这里有更多讨论。

Answer 7

回答by user1810100

>>> s = '/tmp/10508.constantstring'
>>> s.split('/tmp/')[1].split('constantstring')[0].strip('.')

Answer 8

回答by Avinash Raj

With sed it is possible to do something like this with a string:

使用 sed 可以用字符串做这样的事情：

echo "$STRING" | sed -e "s|.*AAA$.*$ZZZ.*|\1|"

And this will give me 1234 as a result.

结果这会给我 1234。

You could do the same with re.subfunction using the same regex.

您可以re.sub使用相同的正则表达式对函数执行相同的操作。

>>> re.sub(r'.*AAA(.*)ZZZ.*', r'', 'gfgfdAAA1234ZZZuijjk')
'1234'

In basic sed, capturing group are represented by $..$, but in python it was represented by (..).

在基本的 sed 中，捕获组由表示$..$，但在 python 中由(..).

Answer 9

回答by Saeed Zahedian Abroodi

You can find first substring with this function in your code (by character index). Also, you can find what is after a substring.

您可以在代码中使用此函数找到第一个子字符串（按字符索引）。此外，您还可以找到子字符串之后的内容。

def FindSubString(strText, strSubString, Offset=None):
    try:
        Start = strText.find(strSubString)
        if Start == -1:
            return -1 # Not Found
        else:
            if Offset == None:
                Result = strText[Start+len(strSubString):]
            elif Offset == 0:
                return Start
            else:
                AfterSubString = Start+len(strSubString)
                Result = strText[AfterSubString:AfterSubString + int(Offset)]
            return Result
    except:
        return -1

# Example:

Text = "Thanks for contributing an answer to Stack Overflow!"
subText = "to"

print("Start of first substring in a text:")
start = FindSubString(Text, subText, 0)
print(start); print("")

print("Exact substring in a text:")
print(Text[start:start+len(subText)]); print("")

print("What is after substring \"%s\"?" %(subText))
print(FindSubString(Text, subText))

# Your answer:

Text = "gfgfdAAA1234ZZZuijjk"
subText1 = "AAA"
subText2 = "ZZZ"

AfterText1 = FindSubString(Text, subText1, 0) + len(subText1)
BeforText2 = FindSubString(Text, subText2, 0) 

print("\nYour answer:\n%s" %(Text[AfterText1:BeforText2]))

Answer 10

回答by MaxLZ

One liners that return other string if there was no match. Edit: improved version uses nextfunction, replace "not-found"with something else if needed:

如果没有匹配项，则返回其他字符串的一个衬垫。编辑：改进版本使用next功能，"not-found"如果需要用其他东西替换：

import re
res = next( (m.group(1) for m in [re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk" ),] if m), "not-found" )

My other method to do this, less optimal, uses regex 2nd time, still didn't found a shorter way:

我执行此操作的另一种方法不太理想，第二次使用正则表达式，但仍然没有找到更短的方法：

import re
res = ( ( re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk") or re.search("()","") ).group(1) )

Python 如何提取两个标记之间的子字符串？

提问by miernik

采纳答案by eumiro

回答by Lennart Regebro

回答by infrared

回答by andreypopp

回答by tzot

regular expression

正则表达式

string methods

字符串方法

回答by Denis Kutlubaev

回答by user1810100

回答by Avinash Raj

回答by Saeed Zahedian Abroodi

回答by MaxLZ

相关推荐

最近更新

标签

Python 如何提取两个标记之间的子字符串？

提问by miernik

采纳答案by eumiro

回答by Lennart Regebro

回答by infrared

回答by andreypopp

回答by tzot

regular expression

正则表达式

string methods

字符串方法

回答by Denis Kutlubaev

回答by user1810100

回答by Avinash Raj

回答by Saeed Zahedian Abroodi

回答by MaxLZ

相关推荐

Python 如何检查此用户是匿名用户还是我系统上的实际用户？

Setup.py：在 CentOS 上使用 Python2.6 安装 lxml

Python 快速素数分解模块

Python如何获取一个图像中使用的颜色列表

相关推荐

最近更新

标签