Java 如何使用正则表达式确定一个数字是否为素数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2795065/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 12:54:53  来源:igfitidea点击:

How to determine if a number is a prime with regex?

javaregexprimes

提问by kitlite

I found the following code example for Java on RosettaCode:

我在RosettaCode上找到了以下 Java 代码示例:

public static boolean prime(int n) {
  return !new String(new char[n]).matches(".?|(..+?)\1+");
}
  • I don't know Java in particular but understand all aspects of this snippet except for the regex itself
  • I have basic to basic-advanced knowledge of Regex as you find it in the built-in PHP functions
  • 我不是特别了解 Java,但了解此代码段的所有方面,除了正则表达式本身
  • 正如您在内置 PHP 函数中找到的那样,我对 Regex 有基本到基本的高级知识

How does .?|(..+?)\\1+match prime numbers?

如何.?|(..+?)\\1+匹配素数?

采纳答案by Platinum Azure

You said you understand this part, but just to emphasize, the String generated has a length equal to the number supplied. So the string has three characters if and only if n == 3.

你说你理解这部分,但只是强调一下,生成的字符串的长度等于提供的数字。所以字符串有三个字符当且仅当n == 3

.?

The first part of the regex says, "any character, zero or one times". So basically, is there zero or one character-- or, per what I mentioned above, n == 0 || n == 1. If we have the match, then return the negation of that. This corresponds with the fact that zero and one are NOT prime.

正则表达式的第一部分说,“任何字符,零次或一次”。所以基本上,是否有零个或一个字符 - 或者,根据我上面提到的,n == 0 || n == 1. 如果我们有匹配,则返回对它的否定。这对应于零和一不是素数这一事实。

(..+?)\1+

The second part of the regex is a little trickier, relying on groups and backreferences. A group is anything in parentheses, which will then be captured and stored by the regex engine for later use. A backreference is a matched group that is used later on in the same regex.

正则表达式的第二部分有点棘手,它依赖于组和反向引用。组是括号中的任何内容,然后由正则表达式引擎捕获并存储以备后用。反向引用是稍后在同一正则表达式中使用的匹配组。

The group captures 1 character, then 1 or more of any character. (The + character means one or more, but ONLY of the previous character or group. So this is not "two or four or six etc. characters", but rather "two or three etc." The +? is like +, but it tries to match as few characters as possible. + normally tries to gobble the whole string if it can, which is bad in this case because it prevents the backreference part from working.)

该组捕获 1 个字符,然后捕获 1 个或多个任何字符。(+ 字符表示一个或多个,但仅限于前一个字符或组。所以这不是“两个或四个或六个等字符”,而是“两个或三个等”。+? 类似于 +,但是它尝试匹配尽可能少的字符。+ 通常会尝试吞噬整个字符串,如果可以,这在这种情况下很糟糕,因为它阻止了反向引用部分的工作。)

The next part is the backreference: That same set of characters (two or more), appearing again. Said backreference appears one or more times.

下一部分是反向引用:同一组字符(两个或更多)再次出现。所述反向引用出现一次或多次。

So. The captured group corresponds to a natural number of characters (from 2 onward) captured. Said group then appears some natural number of times (also from 2 onward). If there IS a match, this implies that it's possible to find a product of two numbers greater than or equal to 2 that match the n-length string... meaning you have a composite n. So again, return the negation of the successful match: n is NOT prime.

所以。捕获的组对应于捕获的自然数量的字符(从 2 起)。然后该组出现一些自然次数(也是从 2 次开始)。如果有匹配项,这意味着可以找到与 n 长度字符串匹配的大于或等于 2 的两个数字的乘积……这意味着您有一个复合 n。因此,再次返回成功匹配的否定:n 不是素数。

If no match can be found, then you can't come up with a your product of two natural numbers greater than or equal to 2... and you have both a non-match and a prime, hence again the returning of the negation of the match result.

如果找不到匹配项,则您无法得出两个大于或等于 2 的自然数的乘积……并且您既有不匹配项又有素数,因此再次否定返回的比赛结果。

Do you see it now? It's unbelievably tricky (and computationally expensive!) but it's also kind of simple at the same time, once you get it. :-)

你现在看到了吗?这是令人难以置信的棘手(并且计算成本很高!)但同时它也很简单,一旦你掌握了它。:-)

I can elaborate if you have further questions, like on how regex parsing actually works. But I'm trying to keep this answer simple for now (or as simple as it can ever be).

如果您有其他问题,我可以详细说明,例如正则表达式解析的实际工作原理。但我现在试图让这个答案保持简单(或尽可能简单)。

回答by polygenelubricants

I will explain the regex part outside of primality testing: the following regex, given a String swhich consists of repeating String t, finds t.

我将解释素性测试之外的正则表达式部分:以下正则表达式,给定 aString s由重复组成String t,找到t.

    System.out.println(
        "MamamiaMamamiaMamamia".replaceAll("^(.*)\1+$", "")
    ); // prints "Mamamia"

The way it works is that the regex captures (.*)into \1, and then sees if there's \1+following it. Using the ^and $ensures that a match must be of the whole string.

它的工作方式是正则表达式捕获(.*)\1,然后查看是否有\1+跟随它。使用^$确保匹配必须是整个字符串。

So, in a way, we're given String s, which is a "multiple" of String t, and the regex will find such t(the longest possible, since \1is greedy).

所以,在某种程度上,我们得到了String s,它是 的“倍数” String t,正则表达式会找到这样的t(尽可能长的,因为它\1是贪婪的)。

Once you understand why this regex works, then (ignoring the first alternate in OP's regex for now) explaining how it's used for primality testing is simple.

一旦你理解了这个正则表达式为什么起作用,那么(现在忽略 OP 正则表达式中的第一个替代)解释它如何用于素性测试很简单。

  • To test primality of n, first generate a Stringof length n(filled with the same char)
  • The regex captures a Stringof some length (say k) into \1, and tries to match \1+to the rest of the String
    • If there is a match, then nis a proper multiple of k, and therefore nis not prime.
    • If there's no match, then no such kexists that divides n, and nis therefore a prime
  • 要测试 的素性n,首先生成一个String长度n(填充相同的char
  • 正则表达式将String某个长度(比如k)捕获到 中\1,并尝试匹配\1+其余的String
    • 如果匹配,则n是 的真倍数k,因此n不是质数。
    • 如果没有匹配,则不k存在这样的除法nn因此是素数


How does .?|(..+?)\1+match prime numbers?

如何.?|(..+?)\1+匹配素数?

Actually, it doesn't! It matchesStringwhose length is NOT prime!

其实,并没有!它匹配String的长度不是素数!

  • .?: The first part of the alternation matches Stringof length 0or 1(NOT prime by definition)
  • (..+?)\1+: The second part of the alternation, a variation of the regex explained above, matches Stringof length nthat is "a multiple" of a Stringof length k >= 2(i.e. nis a composite, NOT a prime).
    • Note that the reluctant modifier ?is actually not needed for correctness, but it may help speed up the process by trying smaller kfirst
  • .?:交替匹配的第一部分String长度01(根据定义不是素数)
  • (..+?)\1+:交替的第二部分,该正则表达式的变形例如上所述,匹配String长度的n是一个“倍数”String长度的k >= 2(即n是一个复合物,不是素)。
    • 请注意,不愿意修改?实际上是没有必要的正确性,但它可以通过尝试更小的帮助的提速过程k第一

Note the !booleancomplement operator in the returnstatement: it negates the matches. It's when the regex DOESN'Tmatch, nis prime! It's a double-negative logic, so no wonder it's kind of confusing!!

请注意语句中的!boolean补码运算符return:它否定matches. 当正则表达式匹配时,n就是质数!这是一个双重否定的逻辑,所以难怪它有点令人困惑!!



Simplification

简化

Here's a simple rewriting of the code to make it more readable:

这是对代码的简单重写,以使其更具可读性:

public static boolean isPrime(int n) {
    String lengthN = new String(new char[n]);
    boolean isNotPrimeN = lengthN.matches(".?|(..+?)\1+");
    return !isNotPrimeN;
}

The above is essentially the same as the original Java code, but broken apart into multiple statements with assignments to local variables to make the logic easier to understand.

上面的代码和原来的Java代码基本一样,只是拆分成多条语句,给局部变量赋值,让逻辑更容易理解。

We can also simplify the regex, using finite repetition, as follows:

我们还可以使用有限重复来简化正则表达式,如下所示:

boolean isNotPrimeN = lengthN.matches(".{0,1}|(.{2,})\1+");

Again, given a Stringof length n, filled with the same char,

再次,给定 aString的长度n,填充相同的char

  • .{0,1}checks if n = 0,1, NOT prime
  • (.{2,})\1+checks if nis a proper multiple of k >= 2, NOT prime
  • .{0,1}检查是否n = 0,1,而不是素数
  • (.{2,})\1+检查是否n是 的适当倍数k >= 2,而不是素数

With the exception of the reluctant modifier ?on \1(omitted for clarity), the above regex is identical to the original.

除了不情愿的修饰符?on \1(为清楚起见省略)外,上述正则表达式与原始正则表达式相同。



More fun regex

更有趣的正则表达式

The following regex uses similar technique; it should be educational:

以下正则表达式使用了类似的技术;它应该是教育性的:

System.out.println(
    "OhMyGod=MyMyMyOhGodOhGodOhGod"
        .replaceAll("^(.+)(.+)(.+)=(\1|\2|\3)+$", "! ! !")
); // prints "Oh! My! God!"

See also

也可以看看

回答by Eyal Schneider

Nice regex trick (though very inefficient)... :)

不错的正则表达式技巧(虽然效率很低)... :)

The regex defines non-primes as follows:

正则表达式定义非素数如下:

N is not prime if and only if N<=1 OR N is divisible by some K>1.

当且仅当 N<=1 OR N 可被某个 K>1 整除时,N 不是素数。

Instead of passing the simple digital representation of N to the regex engine, it is fed with a sequence of lengthN, composed of a repeating character. The first part of the disjunction checks for N=0 or N=1, and the second one looks for a divisor K>1, using backreferences. It forces the regex engine to find some non empty sub-sequence that can be repeated at least twice in order to form the sequence. If such a subsequence exists, it means that its length divides N, hence N is not prime.

不是将 N 的简单数字表示传递给正则表达式引擎,而是输入长度为N的序列,由重复字符组成。析取的第一部分检查 N=0 或 N=1,第二部分使用反向引用查找除数 K>1。它强制正则表达式引擎找到一些可以重复至少两次的非空子序列,以形成序列。如果存在这样的子序列,则意味着其长度整除 N,因此 N 不是素数。

回答by Dinah

/^1?$|^(11+?)+$/

Apply to numbers after conversion to base 1 (1=1, 2=11, 3=111, ...). Non-primes will match this. If it doesn't match, it is prime.

转换为基数 1(1=1、2=11、3=111、...)后应用于数字。非质数将与此匹配。如果不匹配,则为素数。

Explanation here.

解释在这里