java 您将如何使用正则表达式忽略包含特定子字符串的字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/530441/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How would you use a regular expression to ignore strings that contain a specific substring?
提问by Matt Cummings
How would I go about using a negative lookbehind(or any other method) regular expression to ignore strings that contains a specific substring?
我将如何使用负向后视(或任何其他方法)正则表达式来忽略包含特定子字符串的字符串?
I've read two previous stackoverflow questions:
java-regexp-for-file-filtering
regex-to-match-against-something-that-is-not-a-specific-substring
我已经阅读了之前的两个 stackoverflow 问题:
java-regexp-for-file-filtering
regex-to-match-against-something-that-is-not-a-specific-substring
They are nearlywhat I want... my problem is the string doesn't end with what I want to ignore. If it did this would not be a problem.
它们几乎就是我想要的……我的问题是字符串并没有以我想忽略的结尾。如果这样做了,这将不是问题。
I have a feeling this has to do with the fact that lookarounds are zero-width and something is matching on the second pass through the string... but, I'm none too sure of the internals.
我有一种感觉,这与以下事实有关,即环视宽度为零,并且在第二次通过字符串时匹配某些内容……但是,我不太确定内部结构。
Anyway, if anyone is willing to take the time and explain it I will greatly appreciate it.
无论如何,如果有人愿意花时间解释一下,我将不胜感激。
Here is an example of an input string that I want to ignore:
这是我想忽略的输入字符串的示例:
192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] "GET /FOO/BAR/ HTTP/1.1" 200 2246
192.168.1.10 - - [08/Feb/2009:16:33:54 -0800]“GET /FOO/BAR/HTTP/1.1”200 2246
Here is an example of an input string that I want to keep for further evaluation:
这是我想保留以供进一步评估的输入字符串示例:
192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] "GET /FOO/BAR/content.js HTTP/1.1" 200 2246
192.168.1.10 - - [08/Feb/2009:16:33:54 -0800]“GET /FOO/BAR/content.js HTTP/1.1”200 2246
The key for me is that I want to ignore any HTTP GET that is going after a document root default page.
对我来说,关键是我想忽略文档根默认页面之后的任何 HTTP GET。
Following is my little test harness and the best RegEx I've come up with so far.
以下是我的小测试工具和迄今为止我想出的最好的 RegEx。
public static void main(String[] args){
String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/1.1\" 200 2246";
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/1.1\" 200 2246";
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/"; // This works
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/"; // This works
String inRegEx = "^.*(?:GET).*$(?<!.?/ HTTP/)";
try {
Pattern pattern = Pattern.compile(inRegEx);
Matcher matcher = pattern.matcher(inString);
if (matcher.find()) {
System.out.printf("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
} else {
System.out.printf("No match found.%n");
}
} catch (PatternSyntaxException pse) {
System.out.println("Invalid RegEx: " + inRegEx);
pse.printStackTrace();
}
}
采纳答案by Zach Scrivena
Could you just match any path that doesn't end with a /
你能不能匹配任何不以 a 结尾的路径 /
String inRegEx = "^.* \"GET (.*[^/]) HTTP/.*$";
This can also be done using negative lookbehind
这也可以使用负回顾来完成
String inRegEx = "^.* \"GET (.+)(?<!/) HTTP/.*$";
Here, (?<!/)says "the precedingsequence must notmatch /".
这里(?<!/)说“前面的序列不能匹配/”。
回答by Fabian Steeg
Maybe I'm missing something here, but couldn't you just go without any regular expression and ignore anything for which this is true:
也许我在这里遗漏了一些东西,但是您不能不使用任何正则表达式并忽略任何正确的内容:
string.contains("/ HTTP")
Because a file path will never end with a slash.
因为文件路径永远不会以斜杠结尾。
回答by Gumbo
I would use something like this:
我会使用这样的东西:
"\"GET /FOO/BAR/[^ ]+ HTTP/1\.[01]\""
This matches every path that's not just /FOO/BAR/.
这匹配每个路径,而不仅仅是/FOO/BAR/.
回答by WolfmanDragon
If you are writing Regex this complex, I would recommend building a library of resources outside of StackOverflow.
如果您正在编写如此复杂的 Regex,我建议您在 StackOverflow 之外构建一个资源库。

