Python RegEx - 从字符串中获取多条信息

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/924127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:07:46  来源:igfitidea点击:

Python RegEx - Getting multiple pieces of information out of a string

pythonregex

提问by Joshua

I'm trying to use python to parse a log file and match 4 pieces of information in one regex. (epoch time, SERVICE NOTIFICATION, hostname and CRITICAL) I can't seem to get this to work. So Far I've been able to only match two of the four. Is it possible to do this? Below is an example of a string from the log file and the code I've gotten to work thus far. Any help would make me a happy noob.

我正在尝试使用 python 来解析日志文件并在一个正则表达式中匹配 4 条信息。(纪元时间、服务通知、主机名和关键)我似乎无法让它工作。到目前为止,我只能匹配四个中的两个。是否有可能做到这一点?下面是一个来自日志文件的字符串示例和我目前使用的代码。任何帮助都会让我成为一个快乐的菜鸟。

[1242248375] SERVICE ALERT: myhostname.com;DNS: Recursive;CRITICAL;SOFT;1;CRITICAL - Plugin timed out while executing system call

[1242248375] 服务警报:myhostname.com;DNS:递归;CRITICAL;SOFT;1;CRITICAL - 执行系统调用时插件超时

hostname = options.hostname

n = open('/var/tmp/nagios.log', 'r')
n.readline()
l = [str(x) for x in n]
for line in l:
    match = re.match (r'^\[(\d+)\] SERVICE NOTIFICATION: ', line)
    if match:
       timestamp = int(match.groups()[0])
       print timestamp

回答by Alex Martelli

You can use |to match any one of various possible things, and re.findallto get all non-overlapping matches to some RE.

您可以使用|匹配各种可能事物中的任何一种,并将re.findall所有非重叠匹配项与某些 RE 匹配。

回答by Dietrich Epp

The question is a bit confusing. But you don't need to do everythingwith regular expressions, there are some good plain old string functions you might want to try, like 'split'.

这个问题有点令人困惑。但是您不需要用正则表达式做所有事情,您可能想尝试一些很好的普通旧字符串函数,例如“split”。

This version will also refrain from loading the entire file in memory at once, and it will close the file even when an exception is thrown.

此版本还将避免一次将整个文件加载到内存中,即使抛出异常,它也会关闭文件。

regexp = re.compile(r'\[(\d+)\] SERVICE NOTIFICATION: (.+)')
with open('var/tmp/nagios.log', 'r') as file:
    for line in file:
        fields = line.split(';')
        match = regexp.match(fields[0])
        if match:
            timestamp = int(match.group(1))
            hostname = match.group(2)

回答by Mike Kale

You can use more than one group at a time, e.g.:

您一次可以使用多个组,例如:

import re

logstring = '[1242248375] SERVICE ALERT: myhostname.com;DNS: Recursive;CRITICAL;SOFT;1;CRITICAL - Plugin timed out while executing system call'
exp = re.compile('^\[(\d+)\] ([A-Z ]+): ([A-Za-z0-9.\-]+);[^;]+;([A-Z]+);')
m = exp.search(logstring)

for s in m.groups():
    print s

回答by user114075

If you are looking to split out those particular parts of the line then.

如果您想拆分该行的那些特定部分,那么。

Something along the lines of:

类似的东西:

match = re.match(r'^\[(\d+)\] (.*?): (.*?);.*?;(.*?);',line)

Should give each of those parts in their respective index in groups.

应该将这些部分中的每一个放在各自的索引中。

回答by Oddthinking

Could it be as simple as "SERVICE NOTIFICATION" in your pattern doesn't match "SERVICE ALERT" in your example?

是否可以像您的模式中的“SERVICE NOTIFICATION”与您的示例中的“SERVICE ALERT”不匹配一样简单?