如何从python中的正则表达式匹配返回字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18493677/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I return a string from a regex match in python?
提问by Hyman Dalton
I am running through lines in a text file using a python
script.
I want to search for an img
tag within the text document and return the tag as text.
我正在使用python
脚本运行文本文件中的行。我想img
在文本文档中搜索标签并将标签作为文本返回。
When I run the regex re.match(line)
it returns a _sre.SRE_MATCH
object.
How do I get it to return a string?
当我运行正则表达式时,re.match(line)
它返回一个 _sre.SRE_MATCH
对象。我如何让它返回一个字符串?
import sys
import string
import re
f = open("sample.txt", 'r' )
l = open('writetest.txt', 'w')
count = 1
for line in f:
line = line.rstrip()
imgtag = re.match(r'<img.*?>',line)
print("yo it's a {}".format(imgtag))
When run it prints:
运行时打印:
yo it's a None
yo it's a None
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0>
yo it's a None
yo it's a None
采纳答案by wflynny
You should use re.MatchObject.group(0)
. Like
你应该使用re.MatchObject.group(0)
. 喜欢
imtag = re.match(r'<img.*?>', line).group(0)
Edit:
编辑:
You also might be better off doing something like
你也可能会更好地做类似的事情
imgtag = re.match(r'<img.*?>',line)
if imtag:
print("yo it's a {}".format(imgtag.group(0)))
to eliminate all the None
s.
消除所有的None
s。
回答by Explosion Pills
imgtag.group(0)
or imgtag.group()
. This returns the entire match as a string. You are not capturing anything else either.
imgtag.group(0)
或imgtag.group()
。这将整个匹配项作为字符串返回。你也没有捕捉任何其他东西。
回答by newtover
Considering there might be several img
tags I would recommend re.findall
:
考虑到img
我可能会推荐几个标签re.findall
:
import re
with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
for line in f_in:
for img in re.findall('<img[^>]+>', line):
print >> f_out, "yo it's a {}".format(img)
回答by Sergii Shcherbak
Note that re.match(pattern, string, flags=0)
only returns matches at the beginningof the string. If you want to locate a match anywherein the string, use re.search(pattern, string, flags=0)
instead (https://docs.python.org/3/library/re.html). This will scan the string and return the first match object. Then you can extract the matching string with match_object.group(0)
as the folks suggested.
请注意,re.match(pattern, string, flags=0)
仅返回字符串开头的匹配项。如果要在字符串中的任何位置找到匹配项,请re.search(pattern, string, flags=0)
改用 ( https://docs.python.org/3/library/re.html)。这将扫描字符串并返回第一个匹配对象。然后您可以match_object.group(0)
按照人们的建议提取匹配的字符串。