C# 正则表达式检测字符串中的重复

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/943872/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 03:55:08  来源:igfitidea点击:

Regular Expression to detect repetition within a string

c#regex

提问by Mark Withers

Is it possible to detect repeated number patterns with a regular expression?

是否可以使用正则表达式检测重复的数字模式?

So for example, if I had the following string "034503450345", would it be possible to match the repeated sequence 0345? I have a feeling this is beyond the scope of regex, but I thought I would ask here anyway to see if I have missed something.

例如,如果我有以下字符串“034503450345”,是否可以匹配重复序列 0345?我有一种感觉,这超出了正则表达式的范围,但我想无论如何我都会在这里问,看看我是否遗漏了什么。

采纳答案by RichieHindle

Yes, you can - here's a Python test case

是的,你可以 - 这是一个 Python 测试用例

import re
print re.search(r"(\d+).*", "8034503450345").group(1)
# Prints 0345

The regular expression says "find some sequence of digits, then any amount of other stuff, then the same sequence again."

正则表达式说“找到一些数字序列,然后是任意数量的其他东西,然后再次相同的序列。”

On a barely-related note, here's one of my favourite regular expressions - a prime number detector:

在一个几乎不相关的注释中,这是我最喜欢的正则表达式之一 - 素数检测器:

import re
for i in range(2, 100):
    if not re.search(r"^(xx+)+$", "x"*i):
        print i

回答by Peter Boughton

This expression will match one or more repeating groups:

此表达式将匹配一个或多个重复组:

(.+)(?=+)



Here is the same expression broken down, (using commenting so it can still be used directly as a regex).

这是分解的相同表达式,(使用注释,因此它仍然可以直接用作正则表达式)。

(?x)  # enable regex comment mode
(     # start capturing group
.+    # one or more of any character (excludes newlines by default)
)     # end capturing group
(?=   # begin lookahead
+   # match one or more of the first capturing group
)     # end lookahead



To match a specific pattern, change the .+to that pattern, e.g. \d+for one or more numbers, or \d{4,}to match 4 or more numbers.

要匹配特定模式,请将 更改.+为该模式,例如\d+用于一个或多个数字,或\d{4,}匹配 4 个或更多数字。

To match a specific number of the pattern, change \1+, e.g to \1{4}for four repetitions.

要匹配特定数量的模式,请更改\1+,例如\1{4}为四次重复。

To allow the repetition to not be next to each other, you can add .*?inside the lookahead.

为了让重复不相邻,您可以.*?在前瞻中添加。

回答by sleske

Just to add a note to the (correct) answer from RichieHindle:

只是在 RichieHindle 的(正确)答案中添加注释:

Note that while Python's regexp implementation (and many others, such as Perl's) can do this, this is no longer a regular expression in the narrow sense of the word.

请注意,虽然 Python 的 regexp 实现(以及许多其他实现,例如 Perl 的)可以做到这一点,但这不再是狭义上的正则表达式。

Your example is not a regular language, hence cannot be handled by a pure regular expression. See e.g. the excellent Wikipedia articlefor details.

您的示例不是正则语言,因此不能由纯正则表达式处理。有关详细信息,请参阅例如优秀的Wikipedia 文章

While this is mostly only of academic interest, there are some practical consequences. Real regular expressions can make much better guarantees for maximum runtimes than in this case. So you could get performance problems at some point.

虽然这主要只是学术兴趣,但也有一些实际后果。与这种情况相比,真正的正则表达式可以更好地保证最大运行时间。因此,您可能会在某些时候遇到性能问题。

Not to say that it's not a good solution, but you should realize that you're at the limit of what regular expressions (even in extended form) are capable of, and might want to consider other solutions in case of problems.

并不是说这不是一个好的解决方案,但您应该意识到您正处于正则表达式(即使是扩展形式)的能力极限,并且可能需要考虑其他解决方案以防万一。

回答by SO User

This is the C# code, that uses the backreference construct to find repeated digits. It will work with 034503450345, 123034503450345, 034503450345345, 232034503450345423. The regex is much easier and clearer to understand.

这是 C# 代码,它使用反向引用结构来查找重复的数字。它将与 034503450345、123034503450345、034503450345345、232034503450345423 一起使用。正则表达式更容易理解。

/// <summary>
/// Assigns repeated digits to repeatedDigits, if the digitSequence matches the pattern
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetRepeatedDigits(string digitSequence, out string repeatedDigits)
{
    repeatedDigits = null;

    string pattern = @"^\d*(?<repeat>\d+)\k<repeat>+\d*$";

    if (Regex.IsMatch(digitSequence, pattern))
    {
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
        repeatedDigits = r.Match(digitSequence).Result("${repeat}");
        return true;
    }
    else
        return false;
}

回答by user1920925

Use regex repetition: bar{2,} looks for text with two or more bar: barbar barbarbar ...

使用正则表达式重复: bar{2,} 查找带有两个或更多条的文本: barbar barbarbar ...