.net 用正则表达式匹配数字——只有数字和逗号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4246077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 14:57:51  来源:igfitidea点击:

Matching numbers with regular expressions — only digits and commas

.netregexnumbersmatching

提问by user278618

I can't figure out how to construct a regex for the example values:

我不知道如何为示例值构建正则表达式:

123,456,789
-12,34
1234
-8

Could you help me?

你可以帮帮我吗?

采纳答案by ThiefMaster

If you only want to allow digits and commas, ^[-,0-9]+$is your regex. If you also want to allow spaces, use ^[-,0-9 ]+$.

如果您只想允许数字和逗号,^[-,0-9]+$则是您的正则表达式。如果您还想允许空格,请使用^[-,0-9 ]+$.

However, if you want to allow proper numbers, better go with something like this:

但是,如果您想允许正确的数字,最好使用以下内容:

^([-+] ?)?[0-9]+(,[0-9]+)?$

or simply use .net's number parser(for the various NumberStyles, see MSDN):

或者简单地使用.net 的数字解析器(对于各种 NumberStyles,请参阅MSDN):

try {
    double.Parse(yourString, NumberStyle.Number);
}
catch(FormatException ex) {
    /* Number is not in an accepted format */
}

回答by tchrist

What's a Number?

什么是数字?

I have a simple question for your“simple” question: What precisely do you mean by “a number”?

对于您的“简单”问题,我有一个简单的问题:您所说的“数字”究竟是什么意思?

  • Is ?0a number?
  • How do you feel about √?1?
  • Is ?or ?a number?
  • Is 186,282.42±0.02miles/second one number — or is it two or three of them?
  • Is 6.02e23a number?
  • Is 3.141_592_653_589a number? How about π, or ?? And ?2π?3 ??
  • How many numbers in 0.083??
  • How many numbers in 128.0.0.1?
  • What number does ?hold? How about ???
  • Does 10,5 mmhave one number in it — or does it have two?
  • Is ?83a number — or is it three of them?
  • What number does ??????Ⅻ AUCrepresent, 2762 or 2009?
  • Are ????and ????numbers?
  • What about 0377, 0xDEADBEEF, and 0b111101101?
  • Is Infa number? Is NaN?
  • Is ④②a number? What about ??
  • How do you feel about ?
  • What do ??and ??have to do with numbers? Or ?, ?, and ??
  • ?0数字吗?
  • 你感觉如何√?1
  • ?还是?数字?
  • 186,282.42±0.02英里/秒的一个数字-或者是两个或三个人?
  • 6.02e23数字吗?
  • 3.141_592_653_589数字吗?怎么样π,或者??和?2π?3 ?
  • 有多少个数字0.083?
  • 有多少个数字128.0.0.1
  • ?持有什么数字?怎么样??
  • 里面10,5 mm有一个数字——还是有两个?
  • ?83一个数字——还是三个?
  • ??????Ⅻ AUC2762 或 2009 代表什么数字?
  • ????????数字?
  • 怎么样03770xDEADBEEF0b111101101
  • Inf数字吗?是NaN吗?
  • ④②数字吗?怎么样?
  • 你感觉如何
  • 什么??,并??与数字呢?或者?, ?, 和??

Suggested Patterns

建议的模式

Also, are you familiar with these patterns? Can you explain the pros and cons of each?

另外,你熟悉这些模式吗?你能解释一下每种方法的优缺点吗?

  1. /\D/
  2. /^\d+$/
  3. /^\p{Nd}+$/
  4. /^\pN+$/
  5. /^\p{Numeric_Value:10}$/
  6. /^\P{Numeric_Value:NaN}+$/
  7. /^-?\d+$/
  8. /^[+-]?\d+$/
  9. /^-?\d+\.?\d*$/
  10. /^-?(?:\d+(?:\.\d*)?|\.\d+)$/
  11. /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
  12. /^((\d)(?(?=(\d))|$)(?(?{ord$3==1+ord$2})(?1)|$))$/
  13. /^(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))$/
  14. /^(?:(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}))$/
  15. /^(?:(?:[+-]?)(?:[0123456789]+))$/
  16. /(([+-]?)([0123456789]{1,3}(?:,?[0123456789]{3})*))/
  17. /^(?:(?:[+-]?)(?:[0123456789]{1,3}(?:,?[0123456789]{3})*))$/
  18. /^(?:(?i)(?:[+-]?)(?:(?=[0123456789]|[.])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0123456789]+))|))$/
  19. /^(?:(?i)(?:[+-]?)(?:(?=[01]|[.])(?:[01]{1,3}(?:(?:[,])[01]{3})*)(?:(?:[.])(?:[01]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[01]+))|))$/
  20. /^(?:(?i)(?:[+-]?)(?:(?=[0123456789ABCDEF]|[.])(?:[0123456789ABCDEF]{1,3}(?:(?:[,])[0123456789ABCDEF]{3})*)(?:(?:[.])(?:[0123456789ABCDEF]{0,}))?)(?:(?:[G])(?:(?:[+-]?)(?:[0123456789ABCDEF]+))|))$/
  21. /((?i)([+-]?)((?=[0123456789]|[.])([0123456789]{1,3}(?:(?:[_,]?)[0123456789]{3})*)(?:([.])([0123456789]{0,}))?)(?:([E])(([+-]?)([0123456789]+))|))/
  1. /\D/
  2. /^\d+$/
  3. /^\p{Nd}+$/
  4. /^\pN+$/
  5. /^\p{Numeric_Value:10}$/
  6. /^\P{Numeric_Value:NaN}+$/
  7. /^-?\d+$/
  8. /^[+-]?\d+$/
  9. /^-?\d+\.?\d*$/
  10. /^-?(?:\d+(?:\.\d*)?|\.\d+)$/
  11. /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
  12. /^((\d)(?(?=(\d))|$)(?(?{ord$3==1+ord$2})(?1)|$))$/
  13. /^(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))$/
  14. /^(?:(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}):(?:[0-9a-fA-F]{1,2}))$/
  15. /^(?:(?:[+-]?)(?:[0123456789]+))$/
  16. /(([+-]?)([0123456789]{1,3}(?:,?[0123456789]{3})*))/
  17. /^(?:(?:[+-]?)(?:[0123456789]{1,3}(?:,?[0123456789]{3})*))$/
  18. /^(?:(?i)(?:[+-]?)(?:(?=[0123456789]|[.])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0123456789]+))|))$/
  19. /^(?:(?i)(?:[+-]?)(?:(?=[01]|[.])(?:[01]{1,3}(?:(?:[,])[01]{3})*)(?:(?:[.])(?:[01]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[01]+))|))$/
  20. /^(?:(?i)(?:[+-]?)(?:(?=[0123456789ABCDEF]|[.])(?:[0123456789ABCDEF]{1,3}(?:(?:[,])[0123456789ABCDEF]{3})*)(?:(?:[.])(?:[0123456789ABCDEF]{0,}))?)(?:(?:[G])(?:(?:[+-]?)(?:[0123456789ABCDEF]+))|))$/
  21. /((?i)([+-]?)((?=[0123456789]|[.])([0123456789]{1,3}(?:(?:[_,]?)[0123456789]{3})*)(?:([.])([0123456789]{0,}))?)(?:([E])(([+-]?)([0123456789]+))|))/

I suspect that some of those patterns above mayserve your needs. But I cannot tell you which one or ones — or, if none, supply you another — because you haven't said what you mean by “number”.

我怀疑上面的一些模式可能会满足您的需求。但是我不能告诉你是哪一个——或者,如果没有,给你另一个——因为你没有说出你所说的“数字”是什么意思。

As you see, there are a huge numberof number possibilities: quite probably ?? worth of them, in fact. ?

如您所见,有大量的数字可能性:很可能 ?? 值得他们,事实上。?

Key to Suggested Patterns

建议模式的关键

Each numbered explanation listed below describes the pattern of the corresponding numbered pattern listed above.

下面列出的每个编号说明描述了上面列出的相应编号模式的模式。

  1. Match if there are any non-digitsanywhere in the string, including whitespace like line breaks.
  2. Match only if the string contains nothing but digits, with the possible exception of a trailing line break. Note that a digit is defined as having the property General Category Decimal Number, which is available as \p{Nd}, \p{Decimal_Number}, or \p{General_Category=Decimal_Number}. This is turn is actually just a reflection of those code points whose Numeric Type category is Decimal, which is available as \p{Numeric_Type=Decimal}.
  3. This is the same as 2 in most regex languages. Java is an exception here, because it does not map the simple charclass escapes like \wand \W, \dand \D, \sand \S, and \bor \Binto the appropriate Unicode property. That means you must not use any of those eight one-character escapes for any Unicode data in Java, because they work only on ASCII even though Java always uses Unicode characters internally.
  4. This is slightly different from 3 in that it isn't limited to decimal numbers, but can be any number at all; that is, any character with the \pN, \p{Number}, or \p{General_Category=Number}property. These include \p{Nl}or \p{Letter_Number}for things like Roman numerals and \p{No}or \p{Other_Number}for subscripted and subscripted numbers, fractions, and circled numbers — amongst others, like counting rods.
  5. This matches only those strings composed entirely of numbers whose decimal value is 10, so things like the Roman numeral ten, and , , , ?, ?, ?, and ?.
  6. Only those strings that contain characters that lack the Numeric Value NaN; in other words, all chars must have some numeric value.
  7. Matches only Decimal Numbers, optionally with a leading HYPHEN MINUS.
  8. Same as 7 but now also works if the sign is plus instead of minus.
  9. Looks for decimal numbers, with optional HYPHEN MINUS and optional FULL STOP plus zero or more decimal numbers following.
  10. Same as 9 but doesn't require digits before the dot if it has some afterwards.
  11. Standard floating-point notation per C and many other languages, allowing for scientific notation.
  12. Finds numbers composed only of two or more decimals of any script in descending order, like 987 or 54321. This recursive regex includes a callout to Perl code that checks whether the look ahead digit has a code point value that is the successor of the current digit; that is, its ordinal value is one greater. One could do this in PCRE using a C function as the callout.
  13. This looks for a valid IPv4 address with four decimal numbers in the valid range, like 128.0.0.1 or 255.255.255.240, but not 999.999.999.999.
  14. This looks for a valid MAC addr, so six colon-separate pairs of two ASCII hex digits.
  15. This looks for whole numbers in the ASCII range with an optional leading sign. This is the normal pattern for matching ASCII integers.
  16. This is like 15, except that it requires a comma to separate groups of three.
  17. This is like 15, except that the comma for separating groups is now optional.
  18. This is the normal pattern for matching C-style floating-point numbers in ASCII.
  19. This is like 18, but requiring a comma to separate groups of 3 and in base-2 instead of in base-10.
  20. This is like 19, but in hex. Note that the optional exponent is now indicated by a G instead of an E, since E is a valid hex digit.
  21. This checks that the string contains a C-style floating-point number, but with an optional grouping separator every three digits of either a comma or an underscore (LOW LINE) between them. It also stores that string into the \1capture group, making available as $1after the match succeeds.
  1. 匹配字符串中的任何地方是否有任何非数字,包括换行符等空格。
  2. 仅当字符串只包含数字时才匹配,尾随换行符可能除外。需要注意的是一个数字被定义为具有属性常规类别十进制数,这可作为\p{Nd}\p{Decimal_Number}\p{General_Category=Decimal_Number}。这实际上只是对那些数字类型类别为十进制的代码点的反映,可作为\p{Numeric_Type=Decimal}.
  3. 这与大多数正则表达式语言中的 2 相同。Java 在这里是个例外,因为它没有将像\wand \W\dand \D\sand \S、 and\b或 or 之类的简单字符类转义映射\B到适当的 Unicode 属性中。这意味着您不能对 Java 中的任何 Unicode 数据使用这八个单字符转义中的任何一个,因为它们仅适用于 ASCII,即使 Java 始终在内部使用 Unicode 字符。
  4. 这与 3 略有不同,因为它不限于十进制数,而可以是任何数字;也就是说,与任何字符\pN\p{Number}\p{General_Category=Number}财产。这些包括\p{Nl}\p{Letter_Number}用于诸如罗马数字和/\p{No}\p{Other_Number}用于下标和下标数字、分数和带圆圈的数字之类的东西——其中包括计数棒。
  5. 仅此相匹配的完全,其十进制值是10个数字组成的字符串,所以像罗马数字十位,并且???,和?
  6. 只有那些包含缺少数值 NaN 的字符的字符串;换句话说,所有字符都必须有一些数值。
  7. 仅匹配十进制数,可选择以连字符减号开头。
  8. 与 7 相同,但现在如果符号是加号而不是减号也可以使用。
  9. 查找十进制数,带有可选的 HYPHEN MINUS 和可选的 FULL STOP 加上零个或多个十进制数。
  10. 与 9 相同,但如果点之后有数字,则不需要在点之前添加数字。
  11. 每个 C 和许多其他语言的标准浮点表示法,允许科学记数法。
  12. 按降序查找仅由任何脚本的两个或多个小数组成的数字,如 987 或 54321。此递归正则表达式包括对 Perl 代码的调用,用于检查前瞻数字是否具有作为当前数字后继的代码点值; 也就是说,它的序数值大一。可以在 PCRE 中使用 C 函数作为标注来做到这一点。
  13. 这将查找在有效范围内具有四个十进制数字的有效 IPv4 地址,例如 128.0.0.1 或 255.255.255.240,但不是 999.999.999.999。
  14. 这将寻找一个有效的 MAC 地址,因此六个冒号分隔的两个 ASCII 十六进制数字对。
  15. 这将在 ASCII 范围内查找带有可选前导符号的整数。这是匹配 ASCII 整数的正常模式。
  16. 这类似于 15,只是它需要一个逗号来分隔三组。
  17. 这类似于 15,只是用于分隔组的逗号现在是可选的。
  18. 这是在 ASCII 中匹配 C 风格浮点数的正常模式。
  19. 这类似于 18,但需要一个逗号来分隔 3 组和以 2 为基数而不是以 10 为基数的组。
  20. 这就像 19,但是是十六进制的。请注意,可选指数现在由 G 而不是 E 表示,因为 E 是有效的十六进制数字。
  21. 这会检查字符串是否包含 C 样式的浮点数,但每三个数字之间有一个可选的分组分隔符,逗号或下划线(LOW LINE)。它还将该字符串存储到\1捕获组中,$1在匹配成功后使其可用。

Sources and Maintainability

来源和可维护性

Patterns number 1,2,7–11 come from a previous incarnation of the Perl Frequently Asked Questionslist in the question, “How do I validate input?”. That section has been replaced by a suggestion to use the Regexp::Commonmodule, written by Abigailand Damian Conway. The original patterns can still be found in Recipe 2.1 of the Perl Cookbook, “Checking Whether a String Is a Valid Number”, solutions to which can be found for a dizzying number of diverse languages, including ada, common lisp, groovy, guile, haskell, java, merd, ocaml, php, pike, python, rexx, ruby, and tcl at the the PLEAC project.

模式编号 1,2,7–11 来自问题“如何验证输入?”中Perl常见问题列表的前一个版本。该部分已被AbigailDamian Conway编写的使用Regexp::Common模块的建议所取代。原始模式仍然可以在Perl Cookbook 的Recipe 2.1 中找到,“检查字符串是否为有效数字”,可以找到令人眼花缭乱的多种语言的解决方案,包括 ada、common lisp、groovy、guile、哈斯克尔,JAVA,merd,ocaml的,PHP,梭子鱼,蟒蛇,REXX,红宝石,而在TCL的PLEAC项目

Pattern 12 could be more legibly rewritten

模式 12 可以更清晰地重写

m{
    ^
    (
        ( \d )
        (?(?= ( \d ) ) | $ )
        (?(?{ ord  == 1 + ord  }) (?1) | $ )
    )
    $
}x

It uses regex recursion, which is found in many pattern engines, including Perl and all the PCRE-derived languages. But it also uses an embedded code callout as the test of its second conditional pattern; to my knowledge, code callouts are available only in Perl and PCRE.

它使用regex recursion,这在许多模式引擎中都可以找到,包括 Perl 和所有 PCRE 派生的语言。但它也使用嵌入式代码标注作为其第二个条件模式的测试;据我所知,代码标注仅在 Perl 和 PCRE 中可用。

Patterns 13–21 were derived from the aforementioned Regexp::Common module. Note that for brevity, these are all written without the whitespace and comments that you would definitely want in production code. Here is how that might look in /xmode:

模式 13-21 源自上述 Regexp::Common 模块。请注意,为简洁起见,这些都是在没有空格和注释的情况下编写的,而您肯定会在生产代码中使用这些空格和注释。以下是它在/x模式下的样子:

$real_rx = qr{ (   # start  to hold entire pattern
    ( [+-]? )                  # optional leading sign, captured into 
    (                          # start 
        (?=                    # look ahead for what next char *will* be
            [0123456789]       #    EITHER:  an ASCII digit
          | [.]                #    OR ELSE: a dot
        )                      # end look ahead
        (                      # start 
           [0123456789]{1,3}       # 1-3 ASCII digits to start the number
           (?:                     # then optionally followed by
               (?: [_,]? )         # an optional grouping separator of comma or underscore
               [0123456789]{3}     # followed by exactly three ASCII digits
           ) *                     # repeated any number of times
        )                          # end 
        (?:                        # begin optional cluster
             ( [.] )               # required literal dot in 
             ( [0123456789]{0,} )  # then optional ASCII digits in 
        ) ?                        # end optional cluster
     )                         # end 
    (?:                        # begin cluster group
        ( [E] )                #   base-10 exponent into 
        (                      #   exponent number into 
            ( [+-] ? )         #     optional sign for exponent into 
            ( [0123456789] + ) #     one or more ASCII digits into 
        )                      #   end 
      |                        #   or else nothing at all
    )                          # end cluster group
) }xi;          # end  and whole pattern, enabling /x and /i modes

From a software engineering perspective, there are still several issues with the style used in the /xmode version immediately above. First, there is a great deal of code repetition, where you see the same [0123456789]; what happens if one of those sequences accidentally leaves a digit out? Second, you are relying on positional parameters, which you must count. That means you might write something like:

从软件工程的角度来看,/x上面的模式版本中使用的样式仍然存在一些问题。首先,有大量的代码重复,你看到的都是相同的[0123456789];如果这些序列之一不小心遗漏了一个数字,会发生什么?其次,您依赖于必须计算的位置参数。这意味着你可能会写一些类似的东西:

(
  $real_number,          # 
  $real_number_sign,     # 
  $pre_exponent_part,    # 
  $pre_decimal_point,    # 
  $decimal_point,        # 
  $post_decimal_point,   # 
  $exponent_indicator,   # 
  $exponent_number,      # 
  $exponent_sign,        # 
  $exponent_digits,      # 
) = ($string =~ /$real_rx/);

which is frankly abominable! It is easy to get the numbering wrong, hard to remember what symbolic names go where, and tedious to write, especially if you don't need all those pieces. Rewriting that to used named groups instead of just numbered ones. Again, I'll use Perl syntax for the variables, but the contents of the Pattern should work anywhere that named groups are supported.

坦率地说,这是可恶的!编号很容易弄错,很难记住符号名称在哪里,而且编写起来很乏味,特别是如果您不需要所有这些部分。将其重写为使用的命名组,而不仅仅是编号组。同样,我将对变量使用 Perl 语法,但模式的内容应该适用于支持命名组的任何地方。

use 5.010;              # Perl got named patterns in 5.10
$real_rx = qr{
  (?<real_number>
    # optional leading sign
    (?<real_number_sign> [+-]? )
    (?<pre_exponent_part>
        (?=                         # look ahead for what next char *will* be
            [0123456789]            #    EITHER:  an ASCII digit
          | [.]                     #    OR ELSE: a dot
        )                           # end look ahead
        (?<pre_decimal_point>
            [0123456789]{1,3}       # 1-3 ASCII digits to start the number
            (?:                     # then optionally followed by
                (?: [_,]? )         # an optional grouping separator of comma or underscore
                [0123456789]{3}     # followed by exactly three ASCII digits
            ) *                     # repeated any number of times
         )                          # end <pre_decimal_part>
         (?:                        # begin optional anon cluster
            (?<decimal_point> [.] ) # required literal dot
            (?<post_decimal_point>
                [0123456789]{0,}  )
         ) ?                        # end optional anon cluster
   )                                # end <pre_exponent_part>
   # begin anon cluster group:
   (?:
       (?<exponent_indicator> [E] ) #   base-10 exponent
       (?<exponent_number>          #   exponent number
           (?<exponent_sign>   [+-] ?         )
           (?<exponent_digits> [0123456789] + )
       )                      #   end <exponent_number>
     |                        #   or else nothing at all
   )                          # end anon cluster group
 )                            # end <real_number>
}xi;

Now the abstractions are named, which helps. You can pull the groups out by name, and you only need the ones you care about. For example:

现在抽象被命名,这有帮助。您可以按名称拉出群组,您只需要您关心的群组。例如:

if ($string =~ /$real_rx/) {
    ($pre_exponent, $exponent_number) =
        @+{ qw< pre_exponent exponent_number > };
}

There's one more thing to do this pattern to make it still more maintainable. The problem is that there's still too much repetition, which means it's too easily changed in one place but not in another. If you were doing a McCabe analysis, you would say its complexity metric is too high. Most of us would just say it's too indented. This makes it hard to follow. To fix all these things, what we need is a “grammatical pattern”, one with a definition block to create named abstractions, which we then treat somewhat like a subroutine call later on in the match.

还有一件事要做这个模式以使其更易于维护。问题是仍然有太多的重复,这意味着它很容易在一个地方改变,而在另一个地方却没有。如果你在做 McCabe 分析,你会说它的复杂度指标太高了。我们大多数人只会说它太缩进了。这使得很难遵循。为了解决所有这些问题,我们需要一个“语法模式”,它带有一个定义块来创建命名抽象,然后我们将其视为稍后在比赛中的子例程调用。

use 5.010;              # Perl first got regex subs in v5.10
$real__rx = qr{ 

    ^                   # anchor to front
    (?&real_number)     # call &real_number regex sub
    $                   # either at end or before final newline

  ##################################################
  # the rest is definition only; think of         ##
  # each named buffer as declaring a subroutine   ##
  # by that name                                  ##
  ##################################################
  (?(DEFINE)
      (?<real_number>
          (?&mantissa)
          (?&abscissa) ?

      )
      (?<abscissa>
          (?&exponent_indicator)
          (?&exponent)
      )
      (?<exponent>
          (&?sign)    ?
          (?&a_digit) +
      )
      (?<mantissa>
         # expecting either of these....
         (?= (?&a_digit)
           | (?&point)
         )
         (?&a_digit) {1,3}
         (?: (?&digit_separator) ?
             (?&a_digit) {3}
         ) *
         (?: (?&point)
             (?&a_digit) *
         ) ?
      )
      (?<point>               [.]     )
      (?<sign>                [+-]    )
      (?<digit_separator>     [_,]    )
      (?<exponent_indicator>  [Ee]    )
      (?<a_digit>             [0-9]   )
   ) # end DEFINE block
}x;

See how insanely betterthe grammatical pattern is than the original line-noisy pattern? It's also far easier to get the syntax right: I typed that in without even one regex syntax error that needed correcting. (OK fine, I typed all the others in without any syntax errors either, but I've been doing this for a while. :)

看到语法模式比原始的嘈杂模式好得多吗?获得正确的语法也容易得多:我输入了它,甚至没有一个需要更正的正则表达式语法错误。(好吧,我输入了所有其他的,也没有任何语法错误,但我已经这样做了一段时间。:)

Grammatical patterns look much more like a BNF than the ugly old regular expressions that people have come to hate. They are far easier to read, write, and maintain. So let's have no more ugly patterns, OK?

语法模式看起来更像是 BNF,而不是人们讨厌的丑陋的旧正则表达式。它们更容易阅读、编写和维护。所以让我们不再有丑陋的模式,好吗?

回答by Gary Green

Try this:

尝试这个:

^-?\d{1,3}(,\d{3})*(\.\d\d)?$|^\.\d\d$

Allows for:

允许:

1
12
.99
12.34 
-18.34
12,345.67
999,999,999,999,999.99

回答by zx81

Since this question has been reopened four years later, I'd like to offer a different take. As someone spends a lot of time working with regex, my view is this:

由于这个问题在四年后重新提出,我想提供不同的看法。由于有人花费大量时间使用正则表达式,我的观点是:

A. If Possible, Don't Use Regex To Validate Numbers

A. 如果可能,不要使用正则表达式来验证数字

If at all possible, use your language. There may be functions to help you determine if the value contained by a string is a valid number. That being said, if you're accepting a variety of formats (commas, etc.) you may not have a choice.

如果可能,请使用您的语言。可能有函数可以帮助您确定字符串包含的值是否为有效数字。话虽如此,如果您接受多种格式(逗号等),您可能别无选择。

B. Don't Write the Regex Manually to Validate a Number Range

B. 不要手动编写正则表达式来验证数字范围

  • Writing a regex to match a number in a given range is hard. You can make a mistake even writing a regex to match a number between 1 and 10.
  • Once you have a regex for a number range, it's hard to debug. First, it's awful to look at. Second, how can you be sure it matches all the values you want without matching any of the values you don't want?Frankly, if you're by yourself, without peers looking over your shoulder, you can't. The best debugging technique is to output a whole range of numbers programmatically and check them against the regex.
  • Fortunately, there are tools to generate a regex for a number range automatically.
  • 编写一个正则表达式来匹配给定范围内的数字是很困难的。即使编写正则表达式来匹配 1 和 10 之间的数字,您也可能犯错误。
  • 一旦你有一个数字范围的正则表达式,就很难调试。首先,它看起来很糟糕。其次,您如何确定它与您想要的所有值匹配,而不匹配您不想要的任何值?坦率地说,如果你一个人,没有同龄人在你的肩膀上看着,你就做不到。最好的调试技术是以编程方式输出整个范围的数字并根据正则表达式检查它们。
  • 幸运的是,有一些工具可以自动生成一个数字范围的正则表达式。

C. Spend your Regex Energy Wisely: Use Tools

C. 明智地使用您的 Regex 能量:使用工具

  • Matching numbers in a given range is a problem that has been solved. There's no need for you to try to reinvent the wheel. It's a problem that can be solved mechanically, by a program, in a way that is guaranteed to be error-free. Take advantage of that free ride.
  • Solving a number-range regex may be interesting for learning purposes a couple of times. Beyond that, if you have energy to invest in furthering your regex skills, spend it on something useful, such as deepening your understanding of regex greed, reading up on Unicode regex, playing with zero-width matches or recursion, reading the SO regex FAQand discovering neat tricks such as how to exclude certain patterns from a regex match... or reading classics such as Matering Regular Expressions, 3rd Edor The Regular Expressions Cookbook, 2nd Ed.
  • 在给定范围内匹配数字是一个已经解决的问题。您无需尝试重​​新发明轮子。这是一个可以机械解决的问题,通过程序,以保证无错误的方式。好好利用那次搭便车吧。
  • 多次求解数字范围正则表达式可能对学习目的很有趣。除此之外,如果你有精力投资于提高你的正则表达式技能,把它花在一些有用的事情上,比如加深你对正则表达式贪婪的理解,阅读Unicode 正则表达式,玩零宽度匹配或递归,阅读SO regex FAQ并发现巧妙的技巧,例如如何从正则表达式匹配中排除某些模式……或阅读经典著作,例如Matering Regular Expressions, 3rd EdThe Regular Expressions Cookbook, 2nd Ed

For tools, you can use:

对于工具,您可以使用:

  • Online: Regex_for_range
  • Offline: the only one I'm aware of is RegexMagic(not free) by regex guru Jan Goyvaerts. It's his beginner regex product, and as I recall it has a great range of options for generating numbers in a given range, among other features.
  • If the conditions are too complex, auto-generate two ranges... then join them with an alternation operator |
  • 在线:Regex_for_range
  • 离线:我所知道的唯一一个是RegexMagic(非免费)正则表达式大师 Jan Goyvaerts。这是他的初学者正则表达式产品,我记得它有很多选项可以在给定范围内生成数字,还有其他功能。
  • 如果条件太复杂,自动生成两个范围……然后用交替运算符将它们连接起来 |

D. An Exercise: Building a Regex for the Specs in the Question

D. 练习:为问题中的规范构建正则表达式

These specs are quite wide... but not necessarily vague. Let's look at the sample values again:

这些规格相当广泛……但不一定含糊。让我们再看看样本值:

123,456,789
-12,34
1234
-8

How do the first two values relate? In the first, the comma matches groups of powers of three. In the second, it probably matches the decimal point in a continental-European-style number format. That does not mean we should allow digits everywhere, as in 1,2,3,44. By the same token, we shouldn't be restrictive. The regex in the accepted answer, for instance, will not match one of the requirements, 123,456,789(see demo).

前两个值如何关联?在第一个中,逗号匹配三的幂组。在第二种情况下,它可能与欧陆式数字格式中的小数点相匹配。这并不意味着我们应该在任何地方都允许数字,如1,2,3,44. 出于同样的原因,我们不应该受到限制。例如,接受的答案中的正则表达式将不符合其中一项要求123,456,789(参见演示)。

How do we build our regex to match the specs?

我们如何构建我们的正则表达式以匹配规范?

  • Let's anchor the expression between ^and $to avoid submatches
  • Let's allow an optional minus: -?
  • Let's match two types of numbers on either side of an alternation (?:this|that):
  • On the left, a European-style digit with optional comma for decimal part: [1-9][0-9]*(?:,[0-9]+)?
  • On the right, a number with thousands separators: [1-9][0-9]{1,2}(?:,[0-9]{3})+
  • 让我们在^和之间锚定表达式$以避免子匹配
  • 让我们允许一个可选的减号: -?
  • 让我们匹配交替两侧的两种类型的数字(?:this|that)
  • 在左边,一个欧式数字,小数部分可选逗号: [1-9][0-9]*(?:,[0-9]+)?
  • 在右侧,一个带有千位分隔符的数字: [1-9][0-9]{1,2}(?:,[0-9]{3})+

The complete regex:

完整的正则表达式:

^-?(?:[1-9][0-9]*(?:,[0-9]+)?|[1-9][0-9]{1,2}(?:,[0-9]{3})+)$

See demo.

演示

This regex does not allow European-style numbers starting with 0, such as 0,12. It's a feature, not a bug. To match those as well, a small tweak will do:

此正则表达式不允许以 开头的欧式数字0,例如0,12. 这是一个功能,而不是一个错误。为了匹配这些,一个小的调整将做:

^-?(?:(?:0|[1-9][0-9]*)(?:,[0-9]+)?|[1-9][0-9]{1,2}(?:,[0-9]{3})+)$

See demo.

演示

回答by Klaus Byskov Pedersen

Try this:

尝试这个:

^-?[\d\,]+$

It will allow an optional -as the first character, and then any combination of commas and digits.

它将允许一个可选项-作为第一个字符,然后是逗号和数字的任意组合。

回答by Andrew

^-?    # start of line, optional -
(\d+   # any number of digits
|(\d{1,3}(,\d{3})*))  # or digits followed by , and three digits
((,|\.)\d+)? # optional comma or period decimal point and more digits
$  # end of line

回答by bukart

^[-+]?(\d{1,3})(,?(?1))*$

Regular expression visualization

Regular expression visualization

Debuggex Demo

调试器演示

So what does it?!

那么它是什么?!

  • ^marks the beginning of the string
  • [-+]?allows a minusor plusright after the beginning of the string
  • (\d{1,3})matches at least one and max three ({1,3}) digits (\d- commonly [0-9]) in a row and groups them (the parenthesises (...)builds the group) as the first group
  • (,?(?1))*ok... let's break this down
    • (...)builds another group (not so important)
    • ,?matches a comma (if existent) right after the first sequence of digits
    • (?1)matches the pattern of the first group again (remember (\d{1,3})); in words: at this point the expression matches a sign (plus/minus/none) followed by a sequence of digits possibly followed by a comma, followed by another sequence of digits again.
    • (,?(?1))*, the *repeats the second part (comma & sequence) as often as possible
  • $finally matches the end of the string
  • ^标记字符串的开始
  • [-+]?允许在字符串开头后加减号加号
  • (\d{1,3})连续匹配至少一个和最多三个 ( {1,3}) 数字(\d- 通常[0-9])并将它们分组(括号(...)构建组)作为第一组
  • (,?(?1))*好的……让我们分解一下
    • (...)建立另一个组(不是那么重要
    • ,?在第一个数字序列之后匹配逗号(如果存在)
    • (?1)再次匹配第一组的模式(记住(\d{1,3}));in words:此时表达式匹配一个符号(加号/减号/无),后跟一个数字序列,可能后跟一个逗号,然后又是另一个数字序列。
    • (,?(?1))*,*尽可能多地重复第二部分(逗号和序列)
  • $finally 匹配字符串的结尾

the advantage of such expressions is, to avoid to define the same pattern within your expression again and again and again... well, a disadvantage is sometimes the complexity :-/

这种表达式的优点是,避免在表达式中一次又一次地定义相同的模式......好吧,缺点有时是复杂性:-/

回答by Jorge DeFlon Developer

In java, You may use java.util.Scannerwith its useLocalemethod

在java中,你可以使用java.util.Scanner它的useLocale方法

Scanner myScanner =  new Scanner(input).useLocale( myLocale)

isADouble = myScanner.hasNextDouble()

回答by arunKr

For the examples:

对于示例:

    ^(-)?([,0-9])+$

It should work. Implement it in whichever language you want.

它应该工作。以您想要的任何语言实现它。

回答by Narasimha

Try this:

尝试这个:

    boxValue = boxValue.replace(/[^0-9\.\,]/g, "");

This RegEx will match only digits, dots, and commas.

此 RegEx 将仅匹配数字、点和逗号。