正则表达式 C# - 是否可以在匹配时提取匹配项?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/841883/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:46:22  来源:igfitidea点击:

Regular expressions C# - is it possible to extract matches while matching?

c#regexextraction

提问by sarsnake

Say, I have a string that I need to verify the correct format of; e.g. RR1234566-001(2 letters, 7 digits, dash, 1 or more digits). I use something like:

说,我有一个字符串,我需要验证其格式是否正确;例如RR1234566-001(2 个字母、7 个数字、破折号、1 个或多个数字)。我使用类似的东西:

        Regex regex = new Regex(patternString);
        if (regex.IsMatch(stringToMatch))
        {
            return true;
        }
        else
        {
            return false;
        }

This works to tell me whether the stringToMatchfollows the pattern defined by patternString. What I need though (and I end up extracting these later) are: 123456and 001-- i.e. portions of the stringToMatch.

这可以告诉我 是否stringToMatch遵循patternString. 不过,我需要的是(我最终会在稍后提取这些内容)是: 123456001- 即stringToMatch.

Please note that this is NOT a question about how to construct regular expressions. What I am asking is: "Is there a way to match and extract values simultaneously without having to use a split function later?"

请注意,这不是关于如何构造正则表达式的问题。我要问的是:“有没有一种方法可以同时匹配和提取值,而不必稍后使用拆分函数?”

采纳答案by Andomar

You can use regex groups to accomplish that. For example, this regex:

您可以使用正则表达式组来实现这一点。例如,这个正则表达式:

(\d\d\d)-(\d\d\d\d\d\d\d)

Let's match a telephone number with this regex:

让我们用这个正则表达式匹配一个电话号码:

var regex = new Regex(@"(\d\d\d)-(\d\d\d\d\d\d\d)");
var match = regex.Match("123-4567890");
if (match.Success)
    ....

If it matches, you will find the first three digits in:

如果匹配,您将在以下位置找到前三位数字:

match.Groups[1].Value

And the second 7 digits in:

和第二个 7 位数字:

match.Groups[2].Value

P.S. In C#, you can use a @"" style string to avoid escaping backslashes. For example, @"\hi\" equals "\\hi\\". Useful for regular expressions and paths.

PS 在 C# 中,您可以使用 @"" 样式字符串来避免转义反斜杠。例如,@"\hi\" 等于 "\\hi\\"。对正则表达式和路径很有用。

P.S.2. The first group is stored in Group[1], not Group[0] as you would expect. That's because Group[0] contains the entire matched string.

PS2。第一个组存储在 Group[1] 中,而不是您期望的 Group[0] 中。这是因为 Group[0] 包含整个匹配的字符串。

回答by cyberconte

Use grouping and Matches instead.

改用分组和匹配。

I.e.:

IE:

// NOTE: pseudocode.
Regex re = new Regex("(\d+)-(\d+)");
Match m = re.Match(stringToMatch))

if (m.Success) {
  String part1 = m.Groups[1].Value;
  String part2 = m.Groups[2].Value;
  return true;
} 
else {
  return false;
}

You can also name the matches, like this:

您还可以命名匹配项,如下所示:

Regex re = new Regex("(?<Part1>\d+)-(?<Part2>\d+)");

and access like this

并像这样访问

  String part1 = m.Groups["Part1"].Value;
  String part2 = m.Groups["Part2"].Value;

回答by LukeH

You can use parentheses to capture groups of characters:

您可以使用括号来捕获字符组:

string test = "RR1234566-001";

// capture 2 letters, then 7 digits, then a hyphen, then 1 or more digits
string rx = @"^([A-Za-z]{2})(\d{7})(\-)(\d+)$";

Match m = Regex.Match(test, rx, RegexOptions.IgnoreCase);

if (m.Success)
{
    Console.WriteLine(m.Groups[1].Value);    // RR
    Console.WriteLine(m.Groups[2].Value);    // 1234566
    Console.WriteLine(m.Groups[3].Value);    // -
    Console.WriteLine(m.Groups[4].Value);    // 001
    return true;
}
else
{
    return false;
}

回答by sahil gupta

string text = "RR1234566-001";
string regex = @"^([A-Z a-z]{2})(\d{7})(\-)(\d+)";
Match mtch = Regex.Matches(text,regex);
if (mtch.Success)
{
    Console.WriteLine(m.Groups[1].Value);    
    Console.WriteLine(m.Groups[2].Value);    
    Console.WriteLine(m.Groups[3].Value);    
    Console.WriteLine(m.Groups[4].Value);    
    return true;
}
else
{
    return false;
}