C# 正则表达式匹配集合多个匹配

Question

提问by Trey Balut

I'm trying to retrieve all text between <td>and</td>, but I only get the first match in my collection. Do I need a *or something? Here is my code.

我正在尝试检索<td>和之间的所有文本</td>，但我只获得了收藏中的第一个匹配项。我需要*什么吗？这是我的代码。

string input = @"<tr class=""row0""><td>09/08/2013</td><td><a href=""/teams/nfl/new-england-patriots/results"">New England Patriots</a></td><td><a href=""/boxscore/2013090803"">L, 23-21</a></td><td align=""center"">0-1-0</td><td align=""right"">65,519</td></tr>";

string pattern = @"(?<=<td>)[^>]*(?=</td>)";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
    try
    {
        listBoxControl1.Items.Add(matches.ToString());
    }
    catch { }
}

Answer 1

采纳答案by Gary C.

Use the following regex expression:

使用以下正则表达式：

string input = "<tr class=\"row0\"><td>09/08/2013</td><td><a href=\"/teams/nfl/new-england-patriots/results\">New England Patriots</a></td><td><a href=\"/boxscore/2013090803\">L, 23-21</a></td><td align=\"center\">0-1-0</td><td align=\"right\">65,519</td></tr>";

string pattern = "(<td>)(?<td_inner>.*?)(</td>)";

MatchCollection matches = Regex.Matches(input, pattern);

foreach (Match match in matches) {
    try {
        Console.WriteLine(match.Groups["td_inner"].Value);
    }
    catch { }
}

Answer 2

回答by Anirudha

HTML(except XHTML) is not strict i.e in some cases

HTML（XHTML 除外）并不严格，即在某些情况下

you could have tags which have no ending tags.
you could have nested tags..

你可以有没有结束标签的标签。
你可以嵌套标签..

regex is not suitable for parsing such complex grammar.You need to use a parser..

正则表达式不适合解析这么复杂的语法。需要使用解析器。

Use htmlagilitypackparser

使用htmlagilitypack解析器

You can use this code to retrieve it using HtmlAgilityPack

您可以使用此代码来检索它 HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);

var tdList = doc.DocumentNode.SelectNodes("//td")
                  .Select(p => p.InnerText)
                  .ToList();

Answer 3

回答by Philipp P

I found a solution here http://geekcoder.org/js-extract-hashtags-from-text/from Nicolas Durand - it seems to work pretty well:

我在这里找到了一个解决方案http://geekcoder.org/js-extract-hashtags-from-text/from Nicolas Durand - 它似乎工作得很好：

#[^ :\n\t\.,\?\/''!]+

Best regards, Phil

最好的问候，菲尔

C# 正则表达式匹配集合多个匹配

提问by Trey Balut

采纳答案by Gary C.

回答by Anirudha

回答by Philipp P

相关推荐

最近更新

标签

C# 正则表达式匹配集合多个匹配

提问by Trey Balut

采纳答案by Gary C.

回答by Anirudha

回答by Philipp P

相关推荐

C# MVC3中添加onsubmit事件调用JS函数

C# 数据读取器与指定的实体框架不兼容

C# MongoDB：仅更新特定字段

C# Visual Studio 如何从调试器序列化对象

相关推荐

最近更新

标签