python 跨多行的python正则表达式

Question

提问by user225882

I'm gathering some info from some cisco devices using python and pexpect, and had a lot of success with REs to extract pesky little items. I'm afraid i've hit the wall on this. Some switches stack together, I have identified this in the script and used a separate routine to parse the data. If the switch is stacked you see the following (extracted from the sho ver output)

我正在使用 python 和 pexpect 从一些 cisco 设备收集一些信息，并且在使用 RE 提取讨厌的小项目方面取得了很多成功。恐怕我已经碰壁了。一些开关堆叠在一起，我在脚本中确定了这一点，并使用单独的例程来解析数据。如果交换机是堆叠的，您会看到以下内容（从 sho ver 输出中提取）

Top Assembly Part Number        : 800-25858-06
Top Assembly Revision Number    : A0
Version ID                      : V08
CLEI Code Number                : COMDE10BRA
Hardware Board Revision Number  : 0x01


Switch   Ports  Model              SW Version              SW Image
------   -----  -----              ----------              ----------
*    1   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M  
     2   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     3   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     4   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M


Switch 02 
---------
Switch Uptime                   : 11 weeks, 2 days, 16 hours, 27 minutes
Base ethernet MAC Address       : 00:26:52:96:2A:80
Motherboard assembly number     : 73-9675-15

When I encounter this I need to extract the switch number & model for each in the table of 4, (sw can be ignored, but there can be between 1 and 9 switches) It's the multiple line thing that has got me as I've been ok with the rest. Any ideas please?

当我遇到这个时，我需要为 4 表中的每个提取开关编号和模型，（sw 可以被忽略，但可以有 1 到 9 个开关）这是多行的事情，因为我已经其余的还好。请问有什么想法吗？

OK apologies. My regex simply started looking at the last group of - until.. then I couldn't work ou where to go!
-{10]\s-{10}(.+)Switch

好的道歉。我的正则表达式只是开始查看最后一组 - 直到......然后我无法工作你去哪里！
-{10]\s-{10}(.+)开关

The model will change and the number of switches will change, I need to capture the 4 lines in this example which are

模型会改变，开关的数量也会改变，我需要捕捉这个例子中的 4 行，它们是

*    1   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M  
     2   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     3   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     4   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M

But each switch could be a different model and there could be between 1 and 9. For this example ideally i'd like to get

但每个开关可能是不同的型号，可能有 1 到 9 个。对于这个例子，理想情况下我想得到

*,1,WS-C3750-48P
,2,WS-C3750-48P
,3,WS-C3750-48P
,4,WS-C3750-48P

(the asterisk means master)
but getting those lines would set me on the right track

（星号表示大师）
但是得到这些台词会让我走上正轨

Answer 1

回答by Alex Martelli

To have .match any character, including a newline, compile your RE with re.DOTALLamong the options (remember, if you have multiple options, use |, the bit-or operator, between them, in order to combine them).

要.匹配任何字符，包括换行符，请在选项中使用re.DOTALL编译您的 RE （请记住，如果您有多个选项|，请在它们之间使用位或运算符，以便组合它们）。

In this case I'm not sure you actually do need this -- why not something like

在这种情况下，我不确定你是否真的需要这个——为什么不喜欢

re.findall(r'(\d+)\s+\d+\s+(WS-\S+)')

assuming for example that the way you identify a "model" is that it starts with WS-? The fact that there will be newlines between one result of findalland the next one is not a problem here. Can you explain exactly how you identify a "model" and why "multiline" is an issue? Maybe you want the re.MULTILINEto make ^match at each start-of-line, to grab your data with some reference to the start of the lines...?

例如，假设您识别“模型”的方式是以WS-? 在一个结果findall和下一个结果之间会有换行符这一事实在这里不是问题。您能准确解释一下您如何识别“模型”以及为什么“多行”是一个问题？也许您希望re.MULTILINE^在每个行首处进行匹配，通过一些对行首的引用来获取您的数据......？

Answer 2

回答by YOU

x="""Top Assembly Part Number        : 800-25858-06
Top Assembly Revision Number    : A0
Version ID                      : V08
CLEI Code Number                : COMDE10BRA
Hardware Board Revision Number  : 0x01


Switch   Ports  Model              SW Version              SW Image
------   -----  -----              ----------              ----------
*    1   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     2   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     3   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M
     4   52     WS-C3750-48P       12.2(35)SE5             C3750-IPBASE-M


Switch 02
---------
Switch Uptime                   : 11 weeks, 2 days, 16 hours, 27 minutes
Base ethernet MAC Address       : 00:26:52:96:2A:80
Motherboard assembly number     : 73-9675-15"""

>>> import re
>>> re.findall("^\*?\s*(\d)\s*\d+\s*([A-Z\d-]+)",x,re.MULTILINE)
[('1', 'WS-C3750-48P'), ('2', 'WS-C3750-48P'), ('3', 'WS-C3750-48P'), ('4', 'WS-C3750-48P')]

UPDATE: because OP edited question, and Thanks Tom for pointing out for +

更新：因为 OP 编辑了问题，感谢 Tom 指出 +

>>> re.findall("^(\*?)\s+(\d)\s+\d+\s+([A-Z\d-]+)",x,re.MULTILINE)
[('*', '1', 'WS-C3750-48P'), ('', '2', 'WS-C3750-48P'), ('', '3', 'WS-C3750-48P'), ('', '4', 'WS-C3750-48P')]
>>>

python 跨多行的python正则表达式

提问by user225882

回答by Alex Martelli

回答by YOU

相关推荐

最近更新

标签

python 跨多行的python正则表达式

提问by user225882

回答by Alex Martelli

回答by YOU

相关推荐

Python：嵌套循环

python 确定完整的 Django url 配置

python Python中优化的点积

python 为什么在使用 Google Protocol Buffers 时会看到“cannot import name descriptor_pb2”错误？

相关推荐

最近更新

标签