C# 正则表达式匹配集合多个匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18814104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 13:17:08  来源:igfitidea点击:

Regex Match Collection multiple matches

c#regex

提问by Trey Balut

I'm trying to retrieve all text between <td>and</td>, but I only get the first match in my collection. Do I need a *or something? Here is my code.

我正在尝试检索<td>和之间的所有文本</td>,但我只获得了收藏中的第一个匹配项。我需要*什么吗?这是我的代码。

string input = @"<tr class=""row0""><td>09/08/2013</td><td><a href=""/teams/nfl/new-england-patriots/results"">New England Patriots</a></td><td><a href=""/boxscore/2013090803"">L, 23-21</a></td><td align=""center"">0-1-0</td><td align=""right"">65,519</td></tr>";

string pattern = @"(?<=<td>)[^>]*(?=</td>)";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
    try
    {
        listBoxControl1.Items.Add(matches.ToString());
    }
    catch { }
}

采纳答案by Gary C.

Use the following regex expression:

使用以下正则表达式:

string input = "<tr class=\"row0\"><td>09/08/2013</td><td><a href=\"/teams/nfl/new-england-patriots/results\">New England Patriots</a></td><td><a href=\"/boxscore/2013090803\">L, 23-21</a></td><td align=\"center\">0-1-0</td><td align=\"right\">65,519</td></tr>";

string pattern = "(<td>)(?<td_inner>.*?)(</td>)";

MatchCollection matches = Regex.Matches(input, pattern);

foreach (Match match in matches) {
    try {
        Console.WriteLine(match.Groups["td_inner"].Value);
    }
    catch { }
}

回答by Anirudha

HTML(except XHTML) is not strict i.e in some cases

HTML(XHTML 除外)并不严格,即在某些情况下

  • you could have tags which have no ending tags.
  • you could have nested tags..
  • 你可以有没有结束标签的标签。
  • 你可以嵌套标签..

regex is not suitable for parsing such complex grammar.You need to use a parser..

正则表达式不适合解析这么复杂的语法。需要使用解析器。

Use htmlagilitypackparser

使用htmlagilitypack解析器

You can use this code to retrieve it using HtmlAgilityPack

您可以使用此代码来检索它 HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);

var tdList = doc.DocumentNode.SelectNodes("//td")
                  .Select(p => p.InnerText)
                  .ToList();

回答by Philipp P

I found a solution here http://geekcoder.org/js-extract-hashtags-from-text/from Nicolas Durand - it seems to work pretty well:

我在这里找到了一个解决方案http://geekcoder.org/js-extract-hashtags-from-text/from Nicolas Durand - 它似乎工作得很好:

#[^ :\n\t\.,\?\/''!]+

Best regards, Phil

最好的问候,菲尔