java 您将如何使用正则表达式忽略包含特定子字符串的字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/530441/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 12:46:00  来源:igfitidea点击:

How would you use a regular expression to ignore strings that contain a specific substring?

javaregexregex-negation

提问by Matt Cummings

How would I go about using a negative lookbehind(or any other method) regular expression to ignore strings that contains a specific substring?

我将如何使用负向后视(或任何其他方法)正则表达式来忽略包含特定子字符串的字符串?

I've read two previous stackoverflow questions:
java-regexp-for-file-filtering
regex-to-match-against-something-that-is-not-a-specific-substring

我已经阅读了之前的两个 stackoverflow 问题:
java-regexp-for-file-filtering
regex-to-match-against-something-that-is-not-a-specific-substring

They are nearlywhat I want... my problem is the string doesn't end with what I want to ignore. If it did this would not be a problem.

它们几乎就是我想要的……我的问题是字符串并没有以我想忽略的结尾。如果这样做了,这将不是问题。

I have a feeling this has to do with the fact that lookarounds are zero-width and something is matching on the second pass through the string... but, I'm none too sure of the internals.

我有一种感觉,这与以下事实有关,即环视宽度为零,并且在第二次通过字符串时匹配某些内容……但是,我不太确定内部结构。

Anyway, if anyone is willing to take the time and explain it I will greatly appreciate it.

无论如何,如果有人愿意花时间解释一下,我将不胜感激。

Here is an example of an input string that I want to ignore:

这是我想忽略的输入字符串的示例:

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] "GET /FOO/BAR/ HTTP/1.1" 200 2246

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800]“GET /FOO/BAR/HTTP/1.1”200 2246

Here is an example of an input string that I want to keep for further evaluation:

这是我想保留以供进一步评估的输入字符串示例:

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] "GET /FOO/BAR/content.js HTTP/1.1" 200 2246

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800]“GET /FOO/BAR/content.js HTTP/1.1”200 2246

The key for me is that I want to ignore any HTTP GET that is going after a document root default page.

对我来说,关键是我想忽略文档根默认页面之后的任何 HTTP GET。

Following is my little test harness and the best RegEx I've come up with so far.

以下是我的小测试工具和迄今为止我想出的最好的 RegEx。

public static void main(String[] args){
String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/1.1\" 200 2246";
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/1.1\" 200 2246";
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/"; // This works
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/"; // This works
String inRegEx = "^.*(?:GET).*$(?<!.?/ HTTP/)";
try {
  Pattern pattern = Pattern.compile(inRegEx);

  Matcher matcher = pattern.matcher(inString);

  if (matcher.find()) {
    System.out.printf("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
  } else {
    System.out.printf("No match found.%n");
  }
} catch (PatternSyntaxException pse) {
  System.out.println("Invalid RegEx: " + inRegEx);
  pse.printStackTrace();
}
}

采纳答案by Zach Scrivena

Could you just match any path that doesn't end with a /

你能不能匹配任何不以 a 结尾的路径 /

String inRegEx = "^.* \"GET (.*[^/]) HTTP/.*$";


This can also be done using negative lookbehind

这也可以使用负回顾来完成

String inRegEx = "^.* \"GET (.+)(?<!/) HTTP/.*$";

Here, (?<!/)says "the precedingsequence must notmatch /".

这里(?<!/)说“前面的序列不能匹配/”。

回答by Fabian Steeg

Maybe I'm missing something here, but couldn't you just go without any regular expression and ignore anything for which this is true:

也许我在这里遗漏了一些东西,但是您不能不使用任何正则表达式并忽略任何正确的内容:

string.contains("/ HTTP")

Because a file path will never end with a slash.

因为文件路径永远不会以斜杠结尾。

回答by Gumbo

I would use something like this:

我会使用这样的东西:

"\"GET /FOO/BAR/[^ ]+ HTTP/1\.[01]\""

This matches every path that's not just /FOO/BAR/.

这匹配每个路径,而不仅仅是/FOO/BAR/.

回答by WolfmanDragon

If you are writing Regex this complex, I would recommend building a library of resources outside of StackOverflow.

如果您正在编写如此复杂的 Regex,我建议您在 StackOverflow 之外构建一个资源库。