C# 正则表达式匹配多个字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/698596/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex to match multiple strings
提问by Jon Tackabury
I need to create a regex that can match multiple strings. For example, I want to find all the instances of "good" or "great". I found some examples, but what I came up with doesn't seem to work:
我需要创建一个可以匹配多个字符串的正则表达式。例如,我想找到“good”或“great”的所有实例。我找到了一些例子,但我想出的东西似乎不起作用:
\b(good|great)\w*\b
Can anyone point me in the right direction?
任何人都可以指出我正确的方向吗?
Edit:I should note that I don't want to just match whole words. For example, I may want to match "ood" or "reat" as well (parts of the words).
编辑:我应该注意,我不想只匹配整个单词。例如,我可能还想匹配“ood”或“reat”(单词的一部分)。
Edit 2:Here is some sample text: "This is a really great story."I might want to match "this" or "really", or I might want to match "eall" or "reat".
编辑 2:这是一些示例文本:“这是一个非常棒的故事。” 我可能想匹配“this”或“really”,或者我可能想匹配“eall”或“reat”。
采纳答案by ojrac
If you can guarantee that there are no reserved regex characters in your word list (or if you escape them), you could just use this code to make a big word list
into @"(a|big|word|list)"
. There's nothing wrong with the |
operator as you're using it, as long as those ()
surround it. It sounds like the \w*
and the \b
patterns are what are interfering with your matches.
如果你能保证有你的话清单中没有保留的正则表达式的字符(或者,如果你逃避他们),你可以只使用这个代码,以a big word list
成@"(a|big|word|list)"
。|
当您使用它时,操作符没有任何问题,只要()
围绕它。听起来\w*
和\b
模式干扰了您的匹配。
String[] pattern_list = whatever;
String regex = String.Format("({0})", String.Join("|", pattern_list));
回答by Chris Ballance
(good)*(great)*
after your edit:
编辑后:
\b(g*o*o*d*)*(g*r*e*a*t*)*\b
回答by ojrac
I don't understand the problem correctly:
我没有正确理解问题:
If you want to match "great" or "reat" you can express this by a pattern like:
如果您想匹配“great”或“reat”,您可以通过以下模式表达:
"g?reat"
This simply says that the "reat"-part must exist and the "g" is optional.
这只是说“reat”部分必须存在,而“g”是可选的。
This would match "reat" and "great" but not "eat", because the first "r" in "reat" is required.
这将匹配“reat”和“great”但不匹配“eat”,因为“reat”中的第一个“r”是必需的。
If you have the too words "great" and "good" and you want to match them both with an optional "g" you can write this like this:
如果你有“great”和“good”这两个词,并且你想用可选的“g”来匹配它们,你可以这样写:
(g?reat|g?ood)
And if you want to include a word-boundary like:
如果你想包含一个像这样的词边界:
\b(g?reat|g?ood)
You should be aware that this would not match anything like "breat" because you have the "reat" but the "r" is not at the word boundary because of the "b".
您应该知道这不会匹配“breat”之类的任何内容,因为您有“reat”,但“r”由于“b”而不在单词边界处。
So if you want to match whole words that contain a substring link "reat" or "ood" then you should try:
因此,如果您想匹配包含子字符串链接“reat”或“ood”的整个单词,那么您应该尝试:
"\b\w*?(reat|ood)\w+\b"
This reads: 1. Beginning with a word boundary begin matching any number word-characters, but don't be gready. 2. Match "reat" or "ood" enshures that only those words are matched that contain one of them. 3. Match any number of word characters following "reat" or "ood" until the next word boundary is reached.
内容如下: 1. 从单词边界开始匹配任意数量的单词字符,但不要贪婪。2. 匹配“reat”或“ood”确保仅匹配包含其中之一的那些词。3. 匹配“reat”或“ood”之后的任意数量的单词字符,直到到达下一个单词边界。
This will match:
这将匹配:
"goodness", "good", "ood" (if a complete word)
“goodness”、“good”、“ood”(如果是一个完整的词)
It can be read as: Give me all complete words that contain "ood" or "reat".
它可以读作:给我所有包含“ood”或“reat”的完整单词。
Is that what you are looking for?
这就是你要找的吗?
回答by KOGI
I'm not entirely sure that regex alone offers a solution for what you're trying to do. You could, however, use the following code to create a regex expression for a given word. Although, the resulting regex pattern has the potential to become very long and slow:
我不完全确定仅 regex 就可以为您尝试做的事情提供解决方案。但是,您可以使用以下代码为给定单词创建正则表达式。虽然,由此产生的正则表达式模式有可能变得非常长和缓慢:
function wordPermutations( $word, $minLength = 2 )
{
$perms = array( );
for ($start = 0; $start < strlen( $word ); $start++)
{
for ($end = strlen( $word ); $end > $start; $end--)
{
$perm = substr( $word, $start, ($end - $start));
if (strlen( $perm ) >= $minLength)
{
$perms[] = $perm;
}
}
}
return $perms;
}
Test Code:
测试代码:
$perms = wordPermutations( 'great', 3 ); // get all permutations of "great" that are 3 or more chars in length
var_dump( $perms );
echo ( '/\b('.implode( '|', $perms ).')\b/' );
Example Output:
示例输出:
array
0 => string 'great' (length=5)
1 => string 'grea' (length=4)
2 => string 'gre' (length=3)
3 => string 'reat' (length=4)
4 => string 'rea' (length=3)
5 => string 'eat' (length=3)
/\b(great|grea|gre|reat|rea|eat)\b/
回答by Tomer W
I think you are asking for smth you dont really mean if you want to search for any Part of the word, you litterally searching letters
我想你是在要求 smth 你不是真的意思如果你想搜索这个词的任何部分,你只是在搜索字母
e.g. Search {Hyman, Jim} in "John and Shelly are cool"
例如在“John and Shelly are cool”中搜索{Hyman, Jim}
is searching all letters in the names {J,a,c,k,i,m}
正在搜索名称 {J,a,c,k,i,m} 中的所有字母
*J*ohn *a*nd Shelly *a*re
*J*ohn * a*nd Shelly *a*re
and for that you don't need REG-EX :)
为此,您不需要 REG-EX :)
in my opinion, A SuffixTree can help you with that
在我看来,后缀树可以帮助你
http://en.wikipedia.org/wiki/Suffix_tree#Functionality
http://en.wikipedia.org/wiki/Suffix_tree#Functionality
enjoy.
请享用。
回答by user2125311
Just check for the boolean that Regex.IsMatch()
returns.
只需检查Regex.IsMatch()
返回的布尔值。
if (Regex.IsMatch(line, "condition") && Regex.IsMatch(line, "conditition2"))
The line will have both regex, right.
该行将同时包含正则表达式,对。