使用 Python 正则表达式提取数据

Question

提问by greyfox

I am having some trouble wrapping my head around Python regular expressions to come up with a regular expression to extract specific values.

我在围绕 Python 正则表达式想出一个正则表达式来提取特定值时遇到了一些麻烦。

The page I am trying to parse has a number of productIds which appear in the following format

我试图解析的页面有许多 productIds，它们以下列格式出现

\"productId\":\"111111\"

I need to extract all the values, 111111in this case.

111111在这种情况下，我需要提取所有值。

Answer 1

采纳答案by perreal

t = "\"productId\":\"111111\""
m = re.match("\W*productId[^:]*:\D*(\d+)", t)
if m:
    print m.group(1)

meaning match non-word characters (\W*), then productIdfollowed by non-column characters ([^:]*) and a :. Then match non-digits (\D*) and match and capture following digits ((\d+)).

意思是匹配非单词字符 ( \W*)，然后productId是非列字符 ( [^:]*) 和 a :。然后匹配非数字 ( \D*) 并匹配并捕获以下数字 ( (\d+))。

Output

输出

Answer 2

回答by frickskit

Try this,

尝试这个，

 :\"(\d*)\"

Give more examples of your data if this doesn't do what you want.

如果这不能满足您的要求，请提供更多数据示例。

Answer 3

回答by Fredrik Pihl

something like this:

像这样：

In [13]: s=r'\"productId\":\"111111\"'

In [14]: print s
\"productId\":\"111111\"

In [15]: import re

In [16]: re.findall(r'\d+', s)
Out[16]: ['111111']

Answer 4

回答by Tobia

The backslashes here might add to the confusion, because they are used as an escape character both by (non-raw) Python strings and by the regexp syntax.

这里的反斜杠可能会增加混乱，因为它们被（非原始）Python 字符串和正则表达式语法用作转义字符。

This extracts the product ids from the format you posted:

这将从您发布的格式中提取产品 ID：

re_prodId = re.compile(r'\"productId\":\"([^"]+)\"')

The raw string r'...'does away with one level of backslash escaping; the use of a single quote as the string delimiter does away with the need to escape double quotes; and finally the backslashe are doubled (only once) because of their special meaning in the regexp language.

原始字符串r'...'取消了一级反斜杠转义；使用单引号作为字符串分隔符不需要转义双引号；最后，反斜杠加倍（仅一次），因为它们在正则表达式语言中的特殊含义。

You can use the regexp object's findall()method to find all matches in some text:

您可以使用 regexp 对象的findall()方法来查找某些文本中的所有匹配项：

re_prodId.findall(text_to_search)

This will return a list of all product ids.

这将返回所有产品 ID 的列表。

使用 Python 正则表达式提取数据

提问by greyfox

采纳答案by perreal

回答by frickskit

回答by Fredrik Pihl

回答by Tobia

相关推荐

最近更新

标签

使用 Python 正则表达式提取数据

提问by greyfox

采纳答案by perreal

回答by frickskit

回答by Fredrik Pihl

回答by Tobia

相关推荐

如何在 Mac OS X 上安装 Python 开发头文件？

random.sample python中的“样本大于总体”

Python 我如何捕捉 numpy 警告，就像它是一个例外（不仅仅是为了测试）？

如何为 Python 3.x 安装 psycopg2？

相关推荐

最近更新

标签