使用正则表达式在 Python 中解析 XML

Question

提问by user2671656

I'm trying to use regex to parse an XMLfile (in my case this seems the simplest way).

我正在尝试使用正则表达式来解析XML文件（在我的情况下，这似乎是最简单的方法）。

For example a line might be:

例如，一行可能是：

line='<City_State>PLAINSBORO, NJ 08536-1906</City_State>'

To access the text for the tag City_State, I'm using:

要访问标记 City_State 的文本，我正在使用：

attr = re.match('>.*<', line)

but nothing is being returned.

但没有任何东西被退回。

Can someone point out what I'm doing wrong?

有人可以指出我做错了什么吗？

Answer 1

采纳答案by TerryA

You normally don't want to use re.match. Quoting from the docs:

您通常不想使用re.match. 引用文档：

If you want to locate a match anywhere in string, use search()instead (see also search() vs. match()).

如果您想在字符串中的任何位置找到匹配项，请改用search()（另请参阅search() 与 match()）。

Note:

笔记：

>>> print re.match('>.*<', line)
None
>>> print re.search('>.*<', line)
<_sre.SRE_Match object at 0x10f666238>
>>> print re.search('>.*<', line).group(0)
>PLAINSBORO, NJ 08536-1906<

Also, why parse XML with regex when you can use something like BeautifulSoup:).

另外，当您可以使用诸如BeautifulSoup:) 之类的东西时，为什么还要使用正则表达式解析 XML 。

>>> from bs4 import BeautifulSoup as BS
>>> line='<City_State>PLAINSBORO, NJ 08536-1906</City_State>'
>>> soup = BS(line)
>>> print soup.find('city_state').text
PLAINSBORO, NJ 08536-1906

Answer 2

回答by Kyle

re.match returns a match only if the pattern matches the entire string. To find substrings matching the pattern, use re.search.

re.match 仅当模式匹配整个字符串时才返回匹配项。要查找与模式匹配的子字符串，请使用 re.search。

And yes, this is a simple way to parse XML, but I would highly encourage you to use a library specifically designed for the task.

是的，这是解析 XML 的一种简单方法，但我强烈建议您使用专门为该任务设计的库。

Answer 3

回答by Viktor Kerkez

Please, just use an XML parser like ElementTree

请使用像 ElementTree 这样的 XML 解析器

>>> from xml.etree import ElementTree as ET
>>> line='<City_State>PLAINSBORO, NJ 08536-1906</City_State>'
>>> ET.fromstring(line).text
'PLAINSBORO, NJ 08536-1906'

使用正则表达式在 Python 中解析 XML

提问by user2671656

采纳答案by TerryA

回答by Kyle

回答by Viktor Kerkez

相关推荐

最近更新

标签

使用正则表达式在 Python 中解析 XML

提问by user2671656

采纳答案by TerryA

回答by Kyle

回答by Viktor Kerkez

相关推荐

Python Django Celery - 无法连接到 amqp://[email protected]:5672//

Python 导入错误：无法导入名称换行

Python 使用 BeautifulSoup 从未关闭的特定元标记中提取内容

如何在 Python 中读取 Excel 文件？

相关推荐

最近更新

标签