python 匹配文件对象中的多行正则表达式

Question

提问by user265978

How can I extract the groups from this regex from a file object (data.txt)?

如何从文件对象 (data.txt) 中提取此正则表达式中的组？

import numpy as np
import re
import os
ifile = open("data.txt",'r')

# Regex pattern
pattern = re.compile(r"""
                ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                \r{2}                       # Two carriage return
                \D+                         # 1 or more non-digits
                storeU=(\d+\.\d+)
                \s
                uIx=(\d+)
                \s
                storeI=(-?\d+.\d+)
                \s
                iIx=(\d+)
                \s
                avgCI=(-?\d+.\d+)
                """, re.VERBOSE | re.MULTILINE)

time = [];

for line in ifile:
    match = re.search(pattern, line)
    if match:
        time.append(match.group(1))

The problem in the last part of the code, is that I iterate line by line, which obviously doesn't work with multiline regex. I have tried to use pattern.finditer(ifile)like this:

代码最后一部分的问题是我逐行迭代，这显然不适用于多行正则表达式。我试过这样使用pattern.finditer(ifile)：

for match in pattern.finditer(ifile):
    print match

... just to see if it works, but the finditer method requires a string or buffer.

...只是为了看看它是否有效，但 finditer 方法需要一个字符串或缓冲区。

I have also tried this method, but can't get it to work

我也试过这个方法，但不能让它工作

matches = [m.groups() for m in pattern.finditer(ifile)]

Any idea?

任何的想法？

After comment from Mike and Tuomas, I was told to use .read().. Something like this:

在 Mike 和 Tuomas 发表评论后，有人告诉我使用 .read().. 像这样：

ifile = open("data.txt",'r').read()

This works fine, but would this be the correct way to search through the file? Can't get it to work...

这工作正常，但这是否是搜索文件的正确方法？无法让它工作...

for i in pattern.finditer(ifile):
    match = re.search(pattern, i)
    if match:
        time.append(match.group(1))

Solution

解决方案

# Open file as file object and read to string
ifile = open("data.txt",'r')

# Read file object to string
text = ifile.read()

# Close file object
ifile.close()

# Regex pattern
pattern_meas = re.compile(r"""
                ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                \n{2}                       # Two newlines
                \D+                         # 1 or more non-digits
                storeU=(\d+\.\d+)           # Decimal-number
                \s
                uIx=(\d+)                   # Fetch uIx-variable
                \s
                storeI=(-?\d+.\d+)          # Fetch storeI-variable
                \s
                iIx=(\d+)                   # Fetch iIx-variable
                \s
                avgCI=(-?\d+.\d+)           # Fetch avgCI-variable
                """, re.VERBOSE | re.MULTILINE)

file_times = open("output_times.txt","w")
for match in pattern_meas.finditer(text):
    output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6))
    file_times.write(output)
file_times.close()

Maybe it can be written more compact and pythonic though....

也许它可以写得更紧凑和 Pythonic....

Answer 1

采纳答案by Mike

You can read the data from the file object into a string with ifile.read()

您可以使用以下命令将文件对象中的数据读入字符串 ifile.read()

Answer 2

回答by Tuomas Pelkonen

Why don't you read the whole file into a buffer using

为什么不使用将整个文件读入缓冲区

buffer = open("data.txt").read()

and then do a search with that?

然后用那个搜索？

Answer 3

回答by SilentGhost

times = [match.group(1) for match in pattern.finditer(ifile.read())]

finditeryield MatchObjects. If the regex doesn't match anything timeswill be an empty list.

finditer产量MatchObjects。如果正则表达式不匹配任何内容times将是一个空列表。

You can also modify your regex to use non-capturing groups for storeU, storeI, iIxand avgCI, then pattern.findallwill contain only matched times.

您还可以修改您正则表达式使用非捕获组storeU，storeI，iIx和avgCI，然后pattern.findall将只包含匹配次数。

Note: naming variable timemight shadow standard library module. timeswould be a better option.

注意：命名变量time可能会影响标准库模块。times将是一个更好的选择。

python 匹配文件对象中的多行正则表达式

提问by user265978

Solution

解决方案

采纳答案by Mike

回答by Tuomas Pelkonen

回答by SilentGhost

相关推荐

最近更新

标签

python 匹配文件对象中的多行正则表达式

提问by user265978

Solution

解决方案

采纳答案by Mike

回答by Tuomas Pelkonen

回答by SilentGhost

相关推荐

python Python字符串解码问题

python 一个对象的 Django 序列化程序

python 使用python向Facebook上的人发送MESSAGE

python 在python中计算非常大的指数

相关推荐

最近更新

标签