python 跨多行的python正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1870954/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python regular expression across multiple lines
提问by user225882
I'm gathering some info from some cisco devices using python and pexpect, and had a lot of success with REs to extract pesky little items. I'm afraid i've hit the wall on this. Some switches stack together, I have identified this in the script and used a separate routine to parse the data. If the switch is stacked you see the following (extracted from the sho ver output)
我正在使用 python 和 pexpect 从一些 cisco 设备收集一些信息,并且在使用 RE 提取讨厌的小项目方面取得了很多成功。恐怕我已经碰壁了。一些开关堆叠在一起,我在脚本中确定了这一点,并使用单独的例程来解析数据。如果交换机是堆叠的,您会看到以下内容(从 sho ver 输出中提取)
Top Assembly Part Number : 800-25858-06
Top Assembly Revision Number : A0
Version ID : V08
CLEI Code Number : COMDE10BRA
Hardware Board Revision Number : 0x01
Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 1 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
2 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
3 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
4 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
Switch 02
---------
Switch Uptime : 11 weeks, 2 days, 16 hours, 27 minutes
Base ethernet MAC Address : 00:26:52:96:2A:80
Motherboard assembly number : 73-9675-15
When I encounter this I need to extract the switch number & model for each in the table of 4, (sw can be ignored, but there can be between 1 and 9 switches) It's the multiple line thing that has got me as I've been ok with the rest. Any ideas please?
当我遇到这个时,我需要为 4 表中的每个提取开关编号和模型,(sw 可以被忽略,但可以有 1 到 9 个开关)这是多行的事情,因为我已经其余的还好。请问有什么想法吗?
OK apologies. My regex simply started looking at the last group of - until.. then I couldn't work ou where to go!
-{10]\s-{10}(.+)Switch
好的道歉。我的正则表达式只是开始查看最后一组 - 直到......然后我无法工作你去哪里!
-{10]\s-{10}(.+)开关
The model will change and the number of switches will change, I need to capture the 4 lines in this example which are
模型会改变,开关的数量也会改变,我需要捕捉这个例子中的 4 行,它们是
* 1 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
2 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
3 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
4 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
But each switch could be a different model and there could be between 1 and 9. For this example ideally i'd like to get
但每个开关可能是不同的型号,可能有 1 到 9 个。对于这个例子,理想情况下我想得到
*,1,WS-C3750-48P
,2,WS-C3750-48P
,3,WS-C3750-48P
,4,WS-C3750-48P
(the asterisk means master)
but getting those lines would set me on the right track
(星号表示大师)
但是得到这些台词会让我走上正轨
回答by Alex Martelli
To have .
match any character, including a newline, compile your RE with re.DOTALLamong the options (remember, if you have multiple options, use |
, the bit-or operator, between them, in order to combine them).
要.
匹配任何字符,包括换行符,请在选项中使用re.DOTALL编译您的 RE (请记住,如果您有多个选项|
,请在它们之间使用位或运算符,以便组合它们)。
In this case I'm not sure you actually do need this -- why not something like
在这种情况下,我不确定你是否真的需要这个——为什么不喜欢
re.findall(r'(\d+)\s+\d+\s+(WS-\S+)')
assuming for example that the way you identify a "model" is that it starts with WS-
? The fact that there will be newlines between one result of findall
and the next one is not a problem here. Can you explain exactly how you identify a "model" and why "multiline" is an issue? Maybe you want the re.MULTILINEto make ^
match at each start-of-line, to grab your data with some reference to the start of the lines...?
例如,假设您识别“模型”的方式是以WS-
? 在一个结果findall
和下一个结果之间会有换行符这一事实在这里不是问题。您能准确解释一下您如何识别“模型”以及为什么“多行”是一个问题?也许您希望re.MULTILINE^
在每个行首处进行匹配,通过一些对行首的引用来获取您的数据......?
回答by YOU
x="""Top Assembly Part Number : 800-25858-06
Top Assembly Revision Number : A0
Version ID : V08
CLEI Code Number : COMDE10BRA
Hardware Board Revision Number : 0x01
Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 1 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
2 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
3 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
4 52 WS-C3750-48P 12.2(35)SE5 C3750-IPBASE-M
Switch 02
---------
Switch Uptime : 11 weeks, 2 days, 16 hours, 27 minutes
Base ethernet MAC Address : 00:26:52:96:2A:80
Motherboard assembly number : 73-9675-15"""
>>> import re
>>> re.findall("^\*?\s*(\d)\s*\d+\s*([A-Z\d-]+)",x,re.MULTILINE)
[('1', 'WS-C3750-48P'), ('2', 'WS-C3750-48P'), ('3', 'WS-C3750-48P'), ('4', 'WS-C3750-48P')]
UPDATE: because OP edited question, and Thanks Tom for pointing out for +
更新:因为 OP 编辑了问题,感谢 Tom 指出 +
>>> re.findall("^(\*?)\s+(\d)\s+\d+\s+([A-Z\d-]+)",x,re.MULTILINE)
[('*', '1', 'WS-C3750-48P'), ('', '2', 'WS-C3750-48P'), ('', '3', 'WS-C3750-48P'), ('', '4', 'WS-C3750-48P')]
>>>