Python 正则表达式匹配所有 5 位数字但没有更大的数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3532947/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Regular Expression Match All 5 Digit Numbers but None Larger
提问by Bryce Thomas
I'm attempting to string match 5-digit coupon codes spread throughout a HTML web page. For example, 53232, 21032, 40021etc... I can handle the simpler case of any string of 5 digits with [0-9]{5}, though this also matches 6, 7, 8... n digit numbers. Can someone please suggest how I would modify this regular expression to match only5 digit numbers?
我正在尝试对散布在整个 HTML 网页中的 5 位优惠券代码进行字符串匹配。例如53232,21032,40021等...我能处理的5位与任何字符串比较简单的情况下[0-9]{5},虽然这也符合6,7,8,...,N数字。有人可以建议我如何修改此正则表达式以仅匹配5 位数字吗?
采纳答案by John La Rooy
>>> import re
>>> s="four digits 1234 five digits 56789 six digits 012345"
>>> re.findall(r"\D(\d{5})\D", s)
['56789']
if they can occur at the very beginning or the very end, it's easier to pad the string than mess with special cases
如果它们可以出现在最开始或最后,那么填充字符串比处理特殊情况更容易
>>> re.findall(r"\D(\d{5})\D", " "+s+" ")
回答by Xavier Combelle
Without padding the string for special case start and end of string, as in John La Rooyanswer one can use the negatives lookahead and lookbehind to handle both cases with a single regular expression
无需为字符串的特殊情况开始和结束填充字符串,如John La Rooy 的回答,可以使用否定的前瞻和后视来使用单个正则表达式处理这两种情况
>>> import re
>>> s = "88888 999999 3333 aaa 12345 hfsjkq 98765"
>>> re.findall(r"(?<!\d)\d{5}(?!\d)", s)
['88888', '12345', '98765']
回答by Crayon Violent
full string: ^[0-9]{5}$
完整字符串: ^[0-9]{5}$
within a string: [^0-9][0-9]{5}[^0-9]
在一个字符串中: [^0-9][0-9]{5}[^0-9]
回答by sth
A very simple way would be to match all groups of digits, like with r'\d+', and then skip every match that isn't five characters long when you process the results.
一种非常简单的方法是匹配所有数字组,例如 with r'\d+',然后在处理结果时跳过所有长度不超过五个字符的匹配项。
回答by Bob
You probably want to match a non-digit before and after your string of 5 digits, like [^0-9]([0-9]{5})[^0-9]. Then you can capture the inner group (the actual string you want).
您可能希望在 5 位数字字符串前后匹配一个非数字,例如[^0-9]([0-9]{5})[^0-9]. 然后您可以捕获内部组(您想要的实际字符串)。
回答by Zaki
You could try
你可以试试
\D\d{5}\D
or maybe
或者可能
\b\d{5}\b
I'm not sure how python treats line-endings and whitespace there though.
我不确定 python 如何处理那里的行尾和空格。
I believe ^\d{5}$would not work for you, as you likely want to get numbers that are somewhere within other text.
我相信^\d{5}$这对您不起作用,因为您可能希望获得其他文本中某处的数字。
回答by igaurav
Note:There is problem in using \Dsince \Dmatches any character that is not a digit , instead use \b.
\bis important here because it matches the word boundary but only at end or beginning of a word .
注意:使用会出现问题,\D因为\D匹配任何不是数字的字符,而是使用\b.
\b在这里很重要,因为它匹配单词边界,但仅在单词的结尾或开头。
import re
input = "four digits 1234 five digits 56789 six digits 01234,56789,01234"
re.findall(r"\b\d{5}\b", input)
result : ['56789', '01234', '56789', '01234']
but if one uses re.findall(r"\D(\d{5})\D", s) output : ['56789', '01234'] \D is unable to handle comma or any continuously entered numerals.
但如果使用 re.findall(r"\D(\d{5})\D", s) output : ['56789', '01234'] \D 无法处理逗号或任何连续输入的数字。
\b is important part here it matches the empty string but only at end or beginning of a word .
\b 在这里很重要,它匹配空字符串,但只在单词的结尾或开头。
More documentation: https://docs.python.org/2/library/re.html
更多文档:https: //docs.python.org/2/library/re.html
More Clarification on usage of \Dvs \b:
关于\Dvs用法的更多说明\b:
This exampleuses \Dbut it doesn't capture all the five digits number.
此示例使用\D但未捕获所有五位数字。
This exampleuses \bwhile capturing all five digits number.
此示例\b在捕获所有五位数字时使用。
Cheers
干杯

