java java正则表达式从更大的字符串中排除特定的字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2191186/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 19:57:24  来源:igfitidea点击:

java regex to exclude specific strings from a larger one

javaregexregexbuddymatcher

提问by nvrs

I have been banging my head against this for some time now: I want to capture all [a-z]+[0-9]?character sequences excluding strings such as sin|cos|tanetc. So having done my regex homework the following regex should work:

一段时间以来,我一直在反对这个问题:我想捕获[a-z]+[0-9]?除字符串等之外的所有字符序列sin|cos|tan。因此,在完成我的正则表达式作业后,以下正则表达式应该可以工作:

(?:(?!(sin|cos|tan)))\b[a-z]+[0-9]?

As you see I am using negative lookahead along with alternation - the \bafter the non-capturing group closing parenthesis is critical to avoid matching the inof sinetc. The regex makes sense and as a matter of fact I have tried it with RegexBuddy and Java as the target implementation and get the wanted result but it doesn't work using Java Matcher and Pattern objects! Any thoughts?

如您所见,我正在使用负前瞻和交替 -\b在非捕获组右括号之后对于避免匹配inofsin等至关重要。正则表达式是有道理的,事实上我已经尝试过使用 RegexBuddy 和 Java 作为目标实现并获得想要的结果,但它不能使用 Java Matcher 和 Pattern 对象!有什么想法吗?

cheers

干杯

回答by bobince

The \bis in the wrong place. It would be looking for a word boundary that didn't have sin/cos/tan beforeit. But a boundary just afterany of those would have a letter at the end, so it would have to be an end-of-word boundary, which is can't be if the next character is a-z.

\b是放错了地方。它要寻找那些没有正弦/余弦/棕褐色单词边界之前它。但是紧跟在其中任何一个之后的边界将在末尾有一个字母,因此它必须是词尾边界,如果下一个字符是 az,则不能是字尾边界。

Also, the negative lookahead would (if it worked) exclude strings like cost, which I'm not sure you want if you're just filtering out keywords.

此外,否定前瞻将(如果有效)排除像 那样的字符串cost,如果您只是过滤掉关键字,我不确定您是否想要。

I suggest:

我建议:

\b(?!sin\b|cos\b|tan\b)[a-z]+[0-9]?\b

Or, more simply, you could just match \b[a-z]+[0-9]?\band filter out the strings in the keyword list afterwards. You don't always have to do everything in regex.

或者,更简单地说,您可以在之后匹配\b[a-z]+[0-9]?\b并过滤关键字列表中的字符串。您不必总是在正则表达式中做所有事情。

回答by Tomalak

So you want [a-z]+[0-9]?(a sequence of at least one letter, optionally followed by a digit), unlessthat letter sequence resembles one of sincostan?

所以你想要[a-z]+[0-9]?(至少一个字母的序列,可选地后跟一个数字),除非该字母序列类似于sincostan?

\b(?!(sin|cos|tan)(?=\d|\b))[a-z]+\d?\b

results:

结果:

cos   - no match
cosy  - full match
cos1  - no match
cosy1 - full match
bla9  - full match
bla99 - no match

回答by nvrs

i forgot to escape the \bfor java so \bshould be \\band it now works. cheers

我忘了转义\bfor java 所以\b应该是\\b,它现在可以工作了。干杯