C# .NET 的 Regex 类和换行符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/988951/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 05:01:07  来源:igfitidea点击:

.NET's Regex class and newline

c#.netregex

提问by empi

Why doesn't .NET regex treat \n as end of line character?

为什么 .NET 正则表达式不将 \n 视为行尾字符?

Sample code:

示例代码:

string[] words = new string[] { "ab1", "ab2\n", "ab3\n\n", "ab4\r", "ab5\r\n", "ab6\n\r" };
Regex regex = new Regex("^[a-z0-9]+$");
foreach (var word in words)
{
    Console.WriteLine("{0} - {1}", word, regex.IsMatch(word));
}

And this is the response I get:

这是我得到的回应:

ab1 - True
ab2
 - True
ab3

 - False
 - False
ab5
 - False
ab6
 - False

Why does the regex match ab2\n?

为什么正则表达式匹配ab2\n

Update:I don't think Multilineis a good solution, that is, I want to validate login to match only specified characters, and it must be single line. If I change the constructor for MultiLine option ab1, ab2, ab3 and ab6 match the expression, ab4 and ab5 don't match it.

更新:我认为Multiline不是一个好的解决方案,即我想验证登录以仅匹配指定的字符,并且必须是单行。如果我更改 MultiLine 选项 ab1、ab2、ab3 和 ab6 的构造函数匹配表达式,则 ab4 和 ab5 不匹配。

采纳答案by Remco Eissing

If the string ends with a line break the RegexOptions.Multilinewill not work. The $will just ignore the last line break since there is nothing after that.

如果字符串以换行符结尾,RegexOptions.Multiline则将不起作用。该$会忽略最后换行符,因为有后,没有什么是。

If you want to match till the very end of the string and ignore any line breaks use \z

如果您想匹配到字符串的最后并忽略任何换行符,请使用 \z

Regex regex = new Regex(@"^[a-z0-9]+\z", RegexOptions.Multiline);

This is for both MutliLine and SingleLine, that doesn't matter.

这对 MutliLine 和 SingleLine 都适用,这无关紧要。

回答by Andrew Hare

From RegexOptions:

来自RegexOptions

Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string.

多行模式。更改 ^ 和 $ 的含义,使它们分别匹配任何行的开头和结尾,而不仅仅是整个字符串的开头和结尾。

So basically if you pass a RegexOptions.Multilineto the Regexconstructor you are instructing that instance to treat the final $as a match for newline characters - not simply the end of the string itself.

因此,基本上,如果您将 a 传递RegexOptions.MultilineRegex构造函数,您是在指示该实例将 final$视为换行符的匹配项 - 而不仅仅是字符串本身的结尾。

回答by SztupY

Could be the ususal windows/linux line ending differences. But it's still strange that \n\ngets a false this way... Did you try with the RegexOptions.Multilineflag set?

可能是通常的 windows/linux 行结束差异。但是\n\n以这种方式得到错误仍然很奇怪......您是否尝试过设置RegexOptions.Multiline标志?

回答by empi

Just to give more details to Smazy answer. This an extract from: Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan. Copyright 2009 Jan Goyvaerts and Steven Levithan, 978-0-596-2068-7

只是为了给 Smazy 答案提供更多细节。这是来自 Jan Goyvaerts 和 Steven Levithan 的正则表达式食谱的摘录。版权所有 2009 Jan Goyvaerts 和 Steven Levithan,978-0-596-2068-7

The difference between ?\Z? and ?\z? comes into play when the last character in your subject text is a line break. In that case, ?\Z? can match at the very end of the subject text, after the final line break, as well as immediately before that line break. The benefit is that you can search for ?omega\Z? without having to worry about stripping off a trailing line break at the end of your subject text. When reading a file line by line, some tools include the line break at the end of the line, whereas others don't; ?\Z? masks this difference. ?\z? matches only at the very end of the subject text, so it will not match text if a trailing line break follows. The anchor ?$? is equivalent to ?\Z?, as long as you do not turn on the “^ and $ match at line breaks” option. This option is off by default for all regex flavors except Ruby. Ruby does not offer a way to turn this option off. Just like ?\Z?, ?$? matches at the very end of the subject text, as well as before the final line break, if any.

?\Z? 之间的区别?和 ?\z? 当主题文本中的最后一个字符是换行符时,就会起作用。在那种情况下, ?\Z? 可以匹配在主题文本的最后、最后一个换行符之后以及该换行符之前。好处是您可以搜索 ?omega\Z? 无需担心在主题文本末尾删除尾随换行符。逐行读取文件时,某些工具在行尾包含换行符,而其他工具则不包含;?\Z? 掩盖了这种差异。?\z? 仅在主题文本的最后匹配,因此如果尾随换行符,它将不匹配文本。锚?$? 等价于 ?\Z?,只要您不打开“^ 和 $ 匹配换行符”选项。对于除 Ruby 之外的所有正则表达式风格,此选项默认处于关闭状态。Ruby 不提供关闭此选项的方法。就像 ?\Z?, ?$? 匹配主题文本的最后以及最后一个换行符之前(如果有)。

Of course, I wouldn't have found it without Smazy answer.

当然,如果没有 Smazy 的回答,我就不会找到它。

回答by Jan Goyvaerts

The .NET regex engine does treat \nas end-of-line. And that's a problem if your string has Windows-style \r\nline breaks. With RegexOptions.Multiline turned on $matches between \rand \nrather than before \r.

.NET 正则表达式引擎确实将其\n视为行尾。如果您的字符串具有 Windows 样式的\r\n换行符,这就是一个问题。使用 RegexOptions.Multiline$\r\n而不是 before之间打开匹配\r

$also matches at the very end of the string just like \z. The difference is that \zcan match only at the very end of the string, while $also matches before a trailing \n. When using RegexOptions.Multiline, $also matches before any \n.

$也匹配在字符串的最后,就像\z. 不同之处在于\z只能在字符串的最末尾$匹配,而在结尾的\n. 使用 RegexOptions.Multiline 时,$也匹配任何\n.

If you're having trouble with line breaks, a trick is to first to a search-and-replace to replace all \rwith nothing to make sure all your lines end with \nonly.

如果您在换行时遇到问题,一个技巧是首先进行搜索和替换以将所有内容替换为空,\r以确保所有行都以\nonly结尾。

回答by Dre

Use regex options, System.Text.RegularExpressions.RegexOptions:

使用正则表达式选项System.Text.RegularExpressions.RegexOptions

string[] words = new string[] { "ab1", "ab2\n", "ab3\n\n", "ab4\r", "ab5\r\n", "ab6\n\r" }; 
Regex regex = new Regex("^[a-z0-9]+$"); 
foreach (var word in words) 
{ 
    Console.WriteLine("{0} - {1}", word,
        regex.IsMatch(word,"^[a-z0-9]+$",
            System.Text.RegularExpressions.RegexOptions.Singleline |
            System.Text.RegularExpressions.RegexOptions.IgnoreCase |
            System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace)); 
}