C# 多字符串匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/423790/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 02:50:41  来源:igfitidea点击:

C# multiple string match

c#regexsearchoverlapping-matches

提问by

I need C# string search algorithm which can match multiple occurance of pattern. For example, if pattern is 'AA' and string is 'BAAABBB' Regex produce match result Index = 1, but I need result Index = 1,2. Can I force Regex to give such result?

我需要可以匹配多次出现的模式的 C# 字符串搜索算法。例如,如果模式是 'AA' 并且字符串是 'BAAABBB' 正则表达式产生匹配结果索引 = 1,但我需要结果索引 = 1,2。我可以强制 Regex 给出这样的结果吗?

回答by Dror

Any regular expression can give an array of MatchCollection

任何正则表达式都可以给出一个MatchCollection数组

回答by AnthonyWJones

Use a lookahead pattern:-

使用前瞻模式:-

"A(?=A)"

“A(?=A)”

This finds any A that is followed by another A without consuming the following A. Hence AAA will match this pattern twice.

这会找到任何跟在另一个 A 之后的 A,而不消耗后面的 A。因此 AAA 将匹配此模式两次。

回答by Sani Singh Huttunen

To summarize all previous comments:

总结以前的所有评论:

Dim rx As Regex = New Regex("(?=AA)")
Dim mc As MatchCollection = rx.Matches("BAAABBB")

This will produce the result you are requesting.

这将产生您请求的结果。

EDIT:
Here is the C# version (working with VB.NET today so I accidentally continued with VB.NET).

编辑:
这是 C# 版本(今天使用 VB.NET,所以我不小心继续使用 VB.NET)。

Regex rx = new Regex("(?=AA)");
MatchCollection mc = rx.Matches("BAAABBB");

回答by Lonzo

Try this:

尝试这个:

       System.Text.RegularExpressions.MatchCollection  matchCol;
       System.Text.RegularExpressions.Regex regX = new System.Text.RegularExpressions.Regex("(?=AA)");

        string index="",str="BAAABBB"; 
        matchCol = regX.Matches(str);
        foreach (System.Text.RegularExpressions.Match mat in matchCol)
            {
                index = index + mat.Index + ",";
            }                       

The contents of index are what you are looking for with the last comma removed.

index 的内容就是您要查找的内容,并删除最后一个逗号。

回答by Alan Moore

Are you really looking for substrings that are only two characters long? If so, searching a 20-million character string is going to be slow no matter what regex you use (or any non-regex technique, for that matter). If the search string is longer, the regex engine can employ a search algorithm like Boyer-Moore or Knuth-Morris-Pratt to speed up the search--the longer the better, in fact.

你真的在寻找只有两个字符长的子字符串吗?如果是这样,无论您使用什么正则表达式(或任何非正则表达式技术,就此而言),搜索 2000 万个字符串都会很慢。如果搜索字符串更长,正则表达式引擎可以使用类似 Boyer-Moore 或 Knuth-Morris-Pratt 的搜索算法来加速搜索——事实上,越长越好。

By the way, the kind of search you're talking about is called overlapping matches; I'll add that to the tags.

顺便说一下,您所说的搜索类型称为重叠匹配;我会把它添加到标签中。