java.util.regex - Pattern.compile() 的重要性?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1720191/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
java.util.regex - importance of Pattern.compile()?
提问by Sidharth
What is the importance of Pattern.compile()
method?
Why do I need to compile the regex string before getting the Matcher
object?
Pattern.compile()
方法的重要性是什么?
为什么我需要在获取Matcher
对象之前编译正则表达式字符串?
For example :
例如 :
String regex = "((\S+)\s*some\s*";
Pattern pattern = Pattern.compile(regex); // why do I need to compile
Matcher matcher = pattern.matcher(text);
采纳答案by Alan Moore
The compile()
method is always called at some point; it's the only way to create a Pattern object. So the question is really, why should you call it explicitly? One reason is that you need a reference to the Matcher object so you can use its methods, like group(int)
to retrieve the contents of capturing groups. The only way to get ahold of the Matcher object is through the Pattern object's matcher()
method, and the only way to get ahold of the Pattern object is through the compile()
method. Then there's the find()
method which, unlike matches()
, is not duplicated in the String or Pattern classes.
该compile()
方法总是在某个时刻被调用;这是创建 Pattern 对象的唯一方法。所以问题是,你为什么要明确地调用它?原因之一是您需要对 Matcher 对象的引用,以便您可以使用它的方法,例如group(int)
检索捕获组的内容。获取 Matcher 对象的唯一方法是通过 Pattern 对象的matcher()
方法,获取Pattern 对象的唯一方法是通过compile()
方法。然后是find()
与 不同的方法matches()
,它不会在 String 或 Pattern 类中重复。
The other reason is to avoid creating the same Pattern object over and over. Every time you use one of the regex-powered methods in String (or the static matches()
method in Pattern), it creates a new Pattern and a new Matcher. So this code snippet:
另一个原因是避免一遍又一遍地创建相同的 Pattern 对象。每次您使用 String 中的正则表达式方法之一(或matches()
Pattern 中的静态方法)时,它都会创建一个新的 Pattern 和一个新的 Matcher。所以这个代码片段:
for (String s : myStringList) {
if ( s.matches("\d+") ) {
doSomething();
}
}
...is exactly equivalent to this:
...完全等同于:
for (String s : myStringList) {
if ( Pattern.compile("\d+").matcher(s).matches() ) {
doSomething();
}
}
Obviously, that's doing a lot of unnecessary work. In fact, it can easily take longer to compile the regex and instantiate the Pattern object, than it does to perform an actual match. So it usually makes sense to pull that step out of the loop. You can create the Matcher ahead of time as well, though they're not nearly so expensive:
显然,这做了很多不必要的工作。事实上,与执行实际匹配相比,编译正则表达式和实例化 Pattern 对象很容易花费更长的时间。因此,将这一步拉出循环通常是有意义的。您也可以提前创建 Matcher,尽管它们的成本并不高:
Pattern p = Pattern.compile("\d+");
Matcher m = p.matcher("");
for (String s : myStringList) {
if ( m.reset(s).matches() ) {
doSomething();
}
}
If you're familiar with .NET regexes, you may be wondering if Java's compile()
method is related to .NET's RegexOptions.Compiled
modifier; the answer is no. Java's Pattern.compile()
method is merely equivalent to .NET's Regex constructor. When you specify the Compiled
option:
如果您熟悉 .NET 正则表达式,您可能想知道 Java 的compile()
方法是否与 .NET 的RegexOptions.Compiled
修饰符有关;答案是不。Java 的Pattern.compile()
方法仅等效于 .NET 的 Regex 构造函数。指定Compiled
选项时:
Regex r = new Regex(@"\d+", RegexOptions.Compiled);
...it compiles the regex directly to CIL byte code, allowing it to perform much faster, but at a significant cost in up-front processing and memory use--think of it as steroids for regexes. Java has no equivalent; there's no difference between a Pattern that's created behind the scenes by String#matches(String)
and one you create explicitly with Pattern#compile(String)
.
...它将正则表达式直接编译为 CIL 字节代码,使其执行速度更快,但在前期处理和内存使用方面付出了巨大的代价——将其视为正则表达式的类固醇。Java 没有等价物;在幕后创建的 PatternString#matches(String)
和您显式创建的 Pattern 之间没有区别Pattern#compile(String)
。
(EDIT: I originally said that all .NET Regex objects are cached, which is incorrect. Since .NET 2.0, automatic caching occurs only with static methods like Regex.Matches()
, not when you call a Regex constructor directly. ref)
(编辑:我原来是说,所有的.NET regex对象缓存,这是不正确由于.NET 2.0,只能用静态的方法,如自动缓存发生。Regex.Matches()
,而不是当你直接调用正则表达式的构造。REF)
回答by jjnguy
When you compile the Pattern
Java does some computation to make finding matches in String
s faster. (Builds an in-memory representation of the regex)
当您编译Pattern
Java 时,会进行一些计算以String
更快地找到s 中的匹配项。(构建正则表达式的内存表示)
If you are going to reuse the Pattern
multiple times you would see a vast performance increase over creating a new Pattern
every time.
如果您要Pattern
多次重用,您会看到Pattern
每次创建一个新的性能都有很大的提高。
In the case of only using the Pattern once, the compiling step just seems like an extra line of code, but, in fact, it can be very helpful in the general case.
在只使用 Pattern 一次的情况下,编译步骤看起来就像是额外的一行代码,但实际上,它在一般情况下非常有用。
回答by Thomas Jung
Compile parsesthe regular expression and builds an in-memory representation. The overhead to compile is significant compared to a match. If you're using a pattern repeatedlyit will gain some performance to cache the compiled pattern.
Compile解析正则表达式并构建内存表示。与匹配相比,编译的开销是显着的。如果你重复使用一个模式,它会获得一些性能来缓存编译后的模式。
回答by DragonBorn
Pre-compiling the regex increases the speed. Re-using the Matcher gives you another slight speedup. If the method gets called frequently say gets called within a loop, the overall performace will certainly go up.
预编译正则表达式可提高速度。重新使用 Matcher 会给你另一个轻微的加速。如果方法被频繁调用,比如在循环中被调用,那么整体性能肯定会上升。
回答by Alireza Fattahi
It is matter of performance and memory usage, compile and keep the complied pattern if you need to use it a lot. A typical usage of regex is to validated user input (format), and also format output data for users, in these classes, saving the complied pattern, seems quite logical as they usually called a lot.
这是性能和内存使用的问题,如果您需要大量使用它,请编译并保留已编译的模式。regex 的一个典型用法是验证用户输入(格式),并为用户格式化输出数据,在这些类中,保存编译的模式,看起来很合乎逻辑,因为它们通常调用很多。
Below is a sample validator, which is really called a lot :)
下面是一个示例验证器,它确实被称为很多 :)
public class AmountValidator {
//Accept 123 - 123,456 - 123,345.34
private static final String AMOUNT_REGEX="\d{1,3}(,\d{3})*(\.\d{1,4})?|\.\d{1,4}";
//Compile and save the pattern
private static final Pattern AMOUNT_PATTERN = Pattern.compile(AMOUNT_REGEX);
public boolean validate(String amount){
if (!AMOUNT_PATTERN.matcher(amount).matches()) {
return false;
}
return true;
}
}
As mentioned by @Alan Moore, if you have reusable regex in your code, (before a loop for example), you must compile and save pattern for reuse.
正如@Alan Moore 所提到的,如果您的代码中有可重用的正则表达式(例如在循环之前),您必须编译并保存模式以供重用。
回答by Devashish Priyadarshi
Similar to 'Pattern.compile' there is 'RECompiler.compile' [from com.sun.org.apache.regexp.internal] where:
1. compiled code for pattern [a-z] has 'az' in it
2. compiled code for pattern [0-9] has '09' in it
3. compiled code for pattern [abc] has 'aabbcc' in it.
类似于 'Pattern.compile' 有 'RECompiler.compile' [来自 com.sun.org.apache.regexp.internal] 其中:
1. 模式 [az] 的编译代码中有 'az'
2. 编译代码模式 [0-9] 中包含“09”
3. 模式 [abc] 的编译代码中包含“aabbcc”。
Thus compiled code is a great way to generalize multiple cases. Thus instead of having different code handling situation 1,2 and 3 . The problem reduces to comparing with the ascii of present and next element in the compiled code, hence the pairs.
Thus
a. anything with ascii between a and z is between a and z
b. anything with ascii between 'a and a is definitely 'a'
因此,编译后的代码是概括多个案例的好方法。因此,而不是有不同的代码处理情况 1,2 和 3 。问题简化为与编译代码中的当前元素和下一个元素的 ascii 进行比较,因此是对。因此
a. 在 a 和 z 之间有 ascii 的任何东西都在 a 和 z
b之间
。'a 和 a 之间有 ascii 的任何东西肯定是 'a'
回答by vkstream
Pattern class is the entry point of the regex engine.You can use it through Pattern.matches() and Pattern.comiple(). #Difference between these two. matches()- for quickly check if a text (String) matches a given regular expression comiple()- create the reference of Pattern. So can use multiple times to match the regular expression against multiple texts.
Pattern 类是正则表达式引擎的入口点。你可以通过 Pattern.matches() 和 Pattern.comiple() 使用它。#这两者的区别。 match()- 用于快速检查文本(字符串)是否与给定的正则表达式匹配 comiple()- 创建 Pattern 的引用。因此可以使用多次来将正则表达式与多个文本进行匹配。
For reference:
以供参考:
public static void main(String[] args) {
//single time uses
String text="The Moon is far away from the Earth";
String pattern = ".*is.*";
boolean matches=Pattern.matches(pattern,text);
System.out.println("Matches::"+matches);
//multiple time uses
Pattern p= Pattern.compile("ab");
Matcher m=p.matcher("abaaaba");
while(m.find()) {
System.out.println(m.start()+ " ");
}
}
回答by apflieger
Pattern.compile()
allow to reuse a regex multiple times (it is threadsafe). The performance benefit can be quite significant.
Pattern.compile()
允许多次重用正则表达式(它是线程安全的)。性能优势可能非常显着。
I did a quick benchmark:
我做了一个快速的基准测试:
@Test
public void recompile() {
var before = Instant.now();
for (int i = 0; i < 1_000_000; i++) {
Pattern.compile("ab").matcher("abcde").matches();
}
System.out.println("recompile " + Duration.between(before, Instant.now()));
}
@Test
public void compileOnce() {
var pattern = Pattern.compile("ab");
var before = Instant.now();
for (int i = 0; i < 1_000_000; i++) {
pattern.matcher("abcde").matches();
}
System.out.println("compile once " + Duration.between(before, Instant.now()));
}
compileOnce was between 3x and 4x faster.
I guess it highly depends on the regex itself but for a regex that is often used, I go for a static Pattern pattern = Pattern.compile(...)
compileOnce快了 3 到 4 倍。我想这在很大程度上取决于正则表达式本身,但是对于经常使用的正则表达式,我会选择static Pattern pattern = Pattern.compile(...)