Python 提取模式匹配

Question

提问by Kannan Ekanath

Python 2.7.1 I am trying to use python regular expression to extract words inside of a pattern

Python 2.7.1 我正在尝试使用 python 正则表达式来提取模式中的单词

I have some string that looks like this

我有一些看起来像这样的字符串

someline abc
someother line
name my_user_name is valid
some more lines

I want to extract the word "my_user_name". I do something like

我想提取“my_user_name”这个词。我做类似的事情

import re
s = #that big string
p = re.compile("name .* is valid", re.flags)
p.match(s) #this gives me <_sre.SRE_Match object at 0x026B6838>

How do I extract my_user_name now?

我现在如何提取 my_user_name？

Answer 1

采纳答案by UltraInstinct

You need to capture from regex. searchfor the pattern, if found, retrieve the string using group(index). Assuming valid checks are performed:

您需要从正则表达式中捕获。search对于模式，如果找到，则使用检索字符串group(index)。假设执行了有效检查：

>>> p = re.compile("name (.*) is valid")
>>> result = p.search(s)
>>> result
<_sre.SRE_Match object at 0x10555e738>
>>> result.group(1)     # group(1) will return the 1st capture.
'my_user_name'

Answer 2

回答by mgilson

You can use matching groups:

您可以使用匹配组：

p = re.compile('name (.*) is valid')

e.g.

例如

>>> import re
>>> p = re.compile('name (.*) is valid')
>>> s = """
... someline abc
... someother line
... name my_user_name is valid
... some more lines"""
>>> p.findall(s)
['my_user_name']

Here I use re.findallrather than re.searchto get all instances of my_user_name. Using re.search, you'd need to get the data from the group on the match object:

在这里，我使用re.findall而不是re.search获取my_user_name. 使用re.search，您需要从匹配对象上的组中获取数据：

>>> p.search(s)   #gives a match object or None if no match is found
<_sre.SRE_Match object at 0xf5c60>
>>> p.search(s).group() #entire string that matched
'name my_user_name is valid'
>>> p.search(s).group(1) #first group that match in the string that matched
'my_user_name'

As mentioned in the comments, you might want to make your regex non-greedy:

正如评论中提到的，您可能想让正则表达式变得非贪婪：

p = re.compile('name (.*?) is valid')

to only pick up the stuff between 'name 'and the next ' is valid'(rather than allowing your regex to pick up other ' is valid'in your group.

只拿起'name '和下一个之间的东西' is valid'（而不是让你的正则表达式拿起' is valid'你组中的其他人。

Answer 3

回答by Henry Keiter

You want a capture group.

你想要一个捕获组。

p = re.compile("name (.*) is valid", re.flags) # parentheses for capture groups
print p.match(s).groups() # This gives you a tuple of your matches.

Answer 4

回答by Apalala

You could use something like this:

你可以使用这样的东西：

import re
s = #that big string
# the parenthesis create a group with what was matched
# and '\w' matches only alphanumeric charactes
p = re.compile("name +(\w+) +is valid", re.flags)
# use search(), so the match doesn't have to happen 
# at the beginning of "big string"
m = p.search(s)
# search() returns a Match object with information about what was matched
if m:
    name = m.group(1)
else:
    raise Exception('name not found')

Answer 5

回答by John

Maybe that's a bit shorter and easier to understand:

也许这更短更容易理解：

import re
text = '... someline abc... someother line... name my_user_name is valid.. some more lines'
>>> re.search('name (.*) is valid', text).group(1)
'my_user_name'

Answer 6

回答by Eugene Yarmash

You can use groups (indicated with '('and ')') to capture parts of the string. The match object's group()method then gives you the group's contents:

您可以使用组（用'('和表示')'）来捕获部分字符串。然后匹配对象的group()方法为您提供组的内容：

>>> import re
>>> s = 'name my_user_name is valid'
>>> match = re.search('name (.*) is valid', s)
>>> match.group(0)  # the entire match
'name my_user_name is valid'
>>> match.group(1)  # the first parenthesized subgroup
'my_user_name'

In Python 3.6+ you can also indexinto a match object instead of using group():

在 Python 3.6+ 中，您还可以索引匹配对象而不是使用group()：

>>> match[0]  # the entire match 
'name my_user_name is valid'
>>> match[1]  # the first parenthesized subgroup
'my_user_name'

Answer 7

回答by wolfovercats

Here's a way to do it without using groups (Python 3.6 or above):

这是一种不使用组（Python 3.6 或更高版本）的方法：

>>> re.search('2\d\d\d[01]\d[0-3]\d', 'report_20191207.xml')[0]
'20191207'

Answer 8

回答by Ryan Stefan

You can also use a capture group (?P<user>pattern)and access the group like a dictionary match['user'].

您还可以使用捕获组(?P<user>pattern)并像访问字典一样访问该组match['user']。

string = '''someline abc\n
            someother line\n
            name my_user_name is valid\n
            some more lines\n'''

pattern = r'name (?P<user>.*) is valid'
matches = re.search(pattern, str(string), re.DOTALL)
print(matches['user'])

# my_user_name

Answer 9

回答by chiceman

It seems like you're actually trying to extract a name vice simply find a match. If this is the case, having span indexes for your match is helpful and I'd recommend using re.finditer. As a shortcut, you know the namepart of your regex is length 5 and the is validis length 9, so you can slice the matching text to extract the name.

看起来您实际上是在尝试提取名称，而只是找到匹配项。如果是这种情况，为您的比赛设置跨度索引会很有帮助，我建议您使用re.finditer. 作为一种快捷方式，您知道name正则表达式的长度为 5，is valid长度为 9，因此您可以对匹配的文本进行切片以提取名称。

Note - In your example, it looks like sis string with line breaks, so that's what's assumed below.

注意 - 在您的示例中，它看起来像是s带有换行符的字符串，因此这就是下面的假设。

## covert s to list of strings separated by line:
s2 = s.splitlines()

## find matches by line: 
for i, j in enumerate(s2):
    matches = re.finditer("name (.*) is valid", j)
    ## ignore lines without a match
    if matches:
        ## loop through match group elements
        for k in matches:
            ## get text
            match_txt = k.group(0)
            ## get line span
            match_span = k.span(0)
            ## extract username
            my_user_name = match_txt[5:-9]
            ## compare with original text
            print(f'Extracted Username: {my_user_name} - found on line {i}')
            print('Match Text:', match_txt)

## covert s to list of strings separated by line:
s2 = s.splitlines()

## find matches by line: 
for i, j in enumerate(s2):
    matches = re.finditer("name (.*) is valid", j)
    ## ignore lines without a match
    if matches:
        ## loop through match group elements
        for k in matches:
            ## get text
            match_txt = k.group(0)
            ## get line span
            match_span = k.span(0)
            ## extract username
            my_user_name = match_txt[5:-9]
            ## compare with original text
            print(f'Extracted Username: {my_user_name} - found on line {i}')
            print('Match Text:', match_txt)

Python 提取模式匹配

提问by Kannan Ekanath

采纳答案by UltraInstinct

回答by mgilson

回答by Henry Keiter

回答by Apalala

回答by John

回答by Eugene Yarmash

回答by wolfovercats

回答by Ryan Stefan

回答by chiceman

相关推荐

最近更新

标签

Python 提取模式匹配

提问by Kannan Ekanath

采纳答案by UltraInstinct

回答by mgilson

回答by Henry Keiter

回答by Apalala

回答by John

回答by Eugene Yarmash

回答by wolfovercats

回答by Ryan Stefan

回答by chiceman

相关推荐

Python 如果文档存在，MongoDB 返回 True

Python 如何将进度条连接到函数？

Python 创建自定义用户注册表单 Django

在 Python 中将二进制数据写入文件

相关推荐

最近更新

标签