C# 如何访问 .NET Regex 中的命名捕获组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/906493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 02:31:15  来源:igfitidea点击:

How do I access named capturing groups in a .NET Regex?

c#.netregex

提问by UnkwnTech

I'm having a hard time finding a good resource that explains how to use Named Capturing Groups in C#. This is the code that I have so far:

我很难找到一个很好的资源来解释如何在 C# 中使用命名捕获组。这是我到目前为止的代码:

string page = Encoding.ASCII.GetString(bytePage);
Regex qariRegex = new Regex("<td><a href=\"(?<link>.*?)\">(?<name>.*?)</a></td>");
MatchCollection mc = qariRegex.Matches(page);
CaptureCollection cc = mc[0].Captures;
MessageBox.Show(cc[0].ToString());

However this always just shows the full line:

然而,这总是只显示整行:

<td><a href="/path/to/file">Name of File</a></td> 

I have experimented with several other "methods" that I've found on various websites but I keep getting the same result.

我已经尝试过在各种网站上找到的其他几种“方法”,但我一直得到相同的结果。

How can I access the named capturing groups that are specified in my regex?

如何访问在我的正则表达式中指定的命名捕获组?

采纳答案by Paolo Tedesco

Use the group collection of the Match object, indexing it with the capturing group name, e.g.

使用 Match 对象的组集合,使用捕获组名称对其进行索引,例如

foreach (Match m in mc){
    MessageBox.Show(m.Groups["link"].Value);
}

回答by Andrew Hare

You specify the named capture group string by passing it to the indexer of the Groupsproperty of a resulting Matchobject.

您可以通过将命名的捕获组字符串传递给Groups结果Match对象的属性的索引器来指定它。

Here is a small example:

这是一个小例子:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        String sample = "hello-world-";
        Regex regex = new Regex("-(?<test>[^-]*)-");

        Match match = regex.Match(sample);

        if (match.Success)
        {
            Console.WriteLine(match.Groups["test"].Value);
        }
    }
}

回答by SO User

The following code sample, will match the pattern even in case of space characters in between. i.e. :

下面的代码示例即使在中间有空格字符的情况下也会匹配模式。IE :

<td><a href='/path/to/file'>Name of File</a></td>

as well as:

也:

<td> <a      href='/path/to/file' >Name of File</a>  </td>

Method returns true or false, depending on whether the input htmlTd string matches the pattern or no. If it matches, the out params contain the link and name respectively.

方法返回 true 或 false,取决于输入的 htmlTd 字符串是否与模式匹配。如果匹配,则输出参数分别包含链接和名称。

/// <summary>
/// Assigns proper values to link and name, if the htmlId matches the pattern
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetHrefDetails(string htmlTd, out string link, out string name)
{
    link = null;
    name = null;

    string pattern = "<td>\s*<a\s*href\s*=\s*(?:\"(?<link>[^\"]*)\"|(?<link>\S+))\s*>(?<name>.*)\s*</a>\s*</td>";

    if (Regex.IsMatch(htmlTd, pattern))
    {
        Regex r = new Regex(pattern,  RegexOptions.IgnoreCase | RegexOptions.Compiled);
        link = r.Match(htmlTd).Result("${link}");
        name = r.Match(htmlTd).Result("${name}");
        return true;
    }
    else
        return false;
}

I have tested this and it works correctly.

我已经测试过它并且它可以正常工作。

回答by tinamou

Additionally if someone have a use case where he needs group names before executing search on Regex object he can use:

此外,如果有人在对 Regex 对象执行搜索之前需要组名的用例,他可以使用:

var regex = new Regex(pattern); // initialized somewhere
// ...
var groupNames = regex.GetGroupNames();

回答by Mariano Desanze

This answers improves on Rashmi Pandit's answer, which is in a way better than the rest because that it seems to completely resolve the exact problem detailed in the question.

这个答案改进了Rashmi Pandit 的答案,在某种程度上比其他答案更好,因为它似乎完全解决了问题中详述的确切问题。

The bad part is that is inefficient and not uses the IgnoreCase option consistently.

不好的部分是效率低下,并且不会始终如一地使用 IgnoreCase 选项。

Inefficient part is because regex can be expensive to construct and execute, and in that answer it could have been constructed just once (calling Regex.IsMatchwas just constructing the regex again behind the scene). And Matchmethod could have been called only once and stored in a variable and then linkand nameshould call Resultfrom that variable.

效率低下的部分是因为正则表达式的构建和执行可能很昂贵,并且在那个答案中它可能只被构建一次(调用Regex.IsMatch只是在幕后再次构建正则表达式)。和Match方法可以被调用一次,并存储在一个变量,然后linkname应调用Result从该变量。

And the IgnoreCase option was only used in the Matchpart but not in the Regex.IsMatchpart.

并且 IgnoreCase 选项仅在Match零件中使用,而在零件中未使用Regex.IsMatch

I also moved the Regex definition outside the method in order to construct it just once (I think is the sensible approach if we are storing that the assembly with the RegexOptions.Compiledoption).

我还将 Regex 定义移到方法之外,以便只构造一次(如果我们使用RegexOptions.Compiled选项存储程序集,我认为这是明智的方法)。

private static Regex hrefRegex = new Regex("<td>\s*<a\s*href\s*=\s*(?:\"(?<link>[^\"]*)\"|(?<link>\S+))\s*>(?<name>.*)\s*</a>\s*</td>",  RegexOptions.IgnoreCase | RegexOptions.Compiled);

public static bool TryGetHrefDetails(string htmlTd, out string link, out string name)
{
    var matches = hrefRegex.Match(htmlTd);
    if (matches.Success)
    {
        link = matches.Result("${link}");
        name = matches.Result("${name}");
        return true;
    }
    else
    {
        link = null;
        name = null;
        return false;
    }
}