Python 正则表达式匹配所有 5 位数字但没有更大的数字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3532947/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:36:30  来源:igfitidea点击:

Python Regular Expression Match All 5 Digit Numbers but None Larger

pythonregex

提问by Bryce Thomas

I'm attempting to string match 5-digit coupon codes spread throughout a HTML web page. For example, 53232, 21032, 40021etc... I can handle the simpler case of any string of 5 digits with [0-9]{5}, though this also matches 6, 7, 8... n digit numbers. Can someone please suggest how I would modify this regular expression to match only5 digit numbers?

我正在尝试对散布在整个 HTML 网页中的 5 位优惠券代码进行字符串匹配。例如532322103240021等...我能处理的5位与任何字符串比较简单的情况下[0-9]{5},虽然这也符合6,7,8,...,N数字。有人可以建议我如何修改此正则表达式以匹配5 位数字吗?

采纳答案by John La Rooy

>>> import re
>>> s="four digits 1234 five digits 56789 six digits 012345"
>>> re.findall(r"\D(\d{5})\D", s)
['56789']

if they can occur at the very beginning or the very end, it's easier to pad the string than mess with special cases

如果它们可以出现在最开始或最后,那么填充字符串比处理特殊情况更容易

>>> re.findall(r"\D(\d{5})\D", " "+s+" ")

回答by Xavier Combelle

Without padding the string for special case start and end of string, as in John La Rooyanswer one can use the negatives lookahead and lookbehind to handle both cases with a single regular expression

无需为字符串的特殊情况开始和结束填充字符串,如John La Rooy 的回答,可以使用否定的前瞻和后视来使用单个正则表达式处理这两种情况

>>> import re
>>> s = "88888 999999 3333 aaa 12345 hfsjkq 98765"
>>> re.findall(r"(?<!\d)\d{5}(?!\d)", s)
['88888', '12345', '98765']

回答by Crayon Violent

full string: ^[0-9]{5}$

完整字符串: ^[0-9]{5}$

within a string: [^0-9][0-9]{5}[^0-9]

在一个字符串中: [^0-9][0-9]{5}[^0-9]

回答by sth

A very simple way would be to match all groups of digits, like with r'\d+', and then skip every match that isn't five characters long when you process the results.

一种非常简单的方法是匹配所有数字组,例如 with r'\d+',然后在处理结果时跳过所有长度不超过五个字符的匹配项。

回答by Bob

You probably want to match a non-digit before and after your string of 5 digits, like [^0-9]([0-9]{5})[^0-9]. Then you can capture the inner group (the actual string you want).

您可能希望在 5 位数字字符串前后匹配一个非数字,例如[^0-9]([0-9]{5})[^0-9]. 然后您可以捕获内部组(您想要的实际字符串)。

回答by Zaki

You could try

你可以试试

\D\d{5}\D

or maybe

或者可能

\b\d{5}\b

I'm not sure how python treats line-endings and whitespace there though.

我不确定 python 如何处理那里的行尾和空格。

I believe ^\d{5}$would not work for you, as you likely want to get numbers that are somewhere within other text.

我相信^\d{5}$这对您不起作用,因为您可能希望获得其他文本中某处的数字。

回答by igaurav

Note:There is problem in using \Dsince \Dmatches any character that is not a digit , instead use \b. \bis important here because it matches the word boundary but only at end or beginning of a word .

注意:使用会出现问题,\D因为\D匹配任何不是数字的字符,而是使用\b. \b在这里很重要,因为它匹配单词边界,但仅在单词的结尾或开头。

import re  

input = "four digits 1234 five digits 56789 six digits 01234,56789,01234"


re.findall(r"\b\d{5}\b", input)  

result : ['56789', '01234', '56789', '01234']

but if one uses re.findall(r"\D(\d{5})\D", s) output : ['56789', '01234'] \D is unable to handle comma or any continuously entered numerals.

但如果使用 re.findall(r"\D(\d{5})\D", s) output : ['56789', '01234'] \D 无法处理逗号或任何连续输入的数字。

\b is important part here it matches the empty string but only at end or beginning of a word .

\b 在这里很重要,它匹配空字符串,但只在单词的结尾或开头。

More documentation: https://docs.python.org/2/library/re.html

更多文档:https: //docs.python.org/2/library/re.html

More Clarification on usage of \Dvs \b:

关于\Dvs用法的更多说明\b

This exampleuses \Dbut it doesn't capture all the five digits number.

此示例使用\D但未捕获所有五位数字。

This exampleuses \bwhile capturing all five digits number.

此示例\b在捕获所有五位数字时使用。

Cheers

干杯