python 匹配文件对象中的多行正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2433648/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Match multiline regex in file object
提问by user265978
How can I extract the groups from this regex from a file object (data.txt)?
如何从文件对象 (data.txt) 中提取此正则表达式中的组?
import numpy as np
import re
import os
ifile = open("data.txt",'r')
# Regex pattern
pattern = re.compile(r"""
^Time:(\d{2}:\d{2}:\d{2}) # Time: 12:34:56 at beginning of line
\r{2} # Two carriage return
\D+ # 1 or more non-digits
storeU=(\d+\.\d+)
\s
uIx=(\d+)
\s
storeI=(-?\d+.\d+)
\s
iIx=(\d+)
\s
avgCI=(-?\d+.\d+)
""", re.VERBOSE | re.MULTILINE)
time = [];
for line in ifile:
match = re.search(pattern, line)
if match:
time.append(match.group(1))
The problem in the last part of the code, is that I iterate line by line, which obviously doesn't work with multiline regex. I have tried to use pattern.finditer(ifile)
like this:
代码最后一部分的问题是我逐行迭代,这显然不适用于多行正则表达式。我试过这样使用pattern.finditer(ifile)
:
for match in pattern.finditer(ifile):
print match
... just to see if it works, but the finditer method requires a string or buffer.
...只是为了看看它是否有效,但 finditer 方法需要一个字符串或缓冲区。
I have also tried this method, but can't get it to work
我也试过这个方法,但不能让它工作
matches = [m.groups() for m in pattern.finditer(ifile)]
Any idea?
任何的想法?
After comment from Mike and Tuomas, I was told to use .read().. Something like this:
在 Mike 和 Tuomas 发表评论后,有人告诉我使用 .read().. 像这样:
ifile = open("data.txt",'r').read()
This works fine, but would this be the correct way to search through the file? Can't get it to work...
这工作正常,但这是否是搜索文件的正确方法?无法让它工作...
for i in pattern.finditer(ifile):
match = re.search(pattern, i)
if match:
time.append(match.group(1))
Solution
解决方案
# Open file as file object and read to string
ifile = open("data.txt",'r')
# Read file object to string
text = ifile.read()
# Close file object
ifile.close()
# Regex pattern
pattern_meas = re.compile(r"""
^Time:(\d{2}:\d{2}:\d{2}) # Time: 12:34:56 at beginning of line
\n{2} # Two newlines
\D+ # 1 or more non-digits
storeU=(\d+\.\d+) # Decimal-number
\s
uIx=(\d+) # Fetch uIx-variable
\s
storeI=(-?\d+.\d+) # Fetch storeI-variable
\s
iIx=(\d+) # Fetch iIx-variable
\s
avgCI=(-?\d+.\d+) # Fetch avgCI-variable
""", re.VERBOSE | re.MULTILINE)
file_times = open("output_times.txt","w")
for match in pattern_meas.finditer(text):
output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6))
file_times.write(output)
file_times.close()
Maybe it can be written more compact and pythonic though....
也许它可以写得更紧凑和 Pythonic....
采纳答案by Mike
You can read the data from the file object into a string with ifile.read()
您可以使用以下命令将文件对象中的数据读入字符串 ifile.read()
回答by Tuomas Pelkonen
Why don't you read the whole file into a buffer using
为什么不使用将整个文件读入缓冲区
buffer = open("data.txt").read()
and then do a search with that?
然后用那个搜索?
回答by SilentGhost
times = [match.group(1) for match in pattern.finditer(ifile.read())]
finditer
yield MatchObjects
. If the regex doesn't match anything times
will be an empty list.
finditer
产量MatchObjects
。如果正则表达式不匹配任何内容times
将是一个空列表。
You can also modify your regex to use non-capturing groups for storeU
, storeI
, iIx
and avgCI
, then pattern.findall
will contain only matched times.
您还可以修改您正则表达式使用非捕获组storeU
,storeI
,iIx
和avgCI
,然后pattern.findall
将只包含匹配次数。
Note: naming variable time
might shadow standard library module. times
would be a better option.
注意:命名变量time
可能会影响标准库模块。times
将是一个更好的选择。