Java 正则表达式线程安全吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1360113/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is Java Regex Thread Safe?
提问by jmq
I have a function that uses Pattern#compile
and a Matcher
to search a list of strings for a pattern.
我有一个使用Pattern#compile
和 aMatcher
来搜索字符串列表以查找模式的函数。
This function is used in multiple threads. Each thread will have a unique pattern passed to the Pattern#compile
when the thread is created. The number of threads and patterns are dynamic, meaning that I can add more Pattern
s and threads during configuration.
该函数在多线程中使用。每个线程Pattern#compile
在创建线程时都会有一个唯一的模式传递给。线程和模式的数量是动态的,这意味着我可以Pattern
在配置期间添加更多的s 和线程。
Do I need to put a synchronize
on this function if it uses regex? Is regex in java thread safe?
synchronize
如果它使用正则表达式,我需要在这个函数上加上一个吗?java线程中的正则表达式安全吗?
采纳答案by Vineet Reynolds
Yes, from the Java API documentation for the Pattern class
是的,来自Pattern 类的 Java API 文档
Instances of this (Pattern) class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use.
这个(Pattern)类的实例是不可变的,并且可以安全地被多个并发线程使用。Matcher 类的实例对于这种使用是不安全的。
If you are looking at performance centric code, attempt to reset the Matcher instance using the reset() method, instead of creating new instances. This would reset the state of the Matcher instance, making it usable for the next regex operation. In fact, it is the state maintained in the Matcher instance that is responsible for it to be unsafe for concurrent access.
如果您正在查看以性能为中心的代码,请尝试使用 reset() 方法重置 Matcher 实例,而不是创建新实例。这将重置 Matcher 实例的状态,使其可用于下一个正则表达式操作。实际上,正是 Matcher 实例中维护的状态导致它对并发访问不安全。
回答by Bob Cross
While you need to remember that thread safety has to take into account the surrounding code as well, you appear to be in luck. The fact that Matchersare created using the Pattern's matcherfactory method and lack public constructors is a positive sign. Likewise, you use the compilestatic method to create the encompassing Pattern.
虽然您需要记住线程安全也必须考虑到周围的代码,但您似乎很幸运。匹配器是使用模式的匹配器工厂方法创建的,并且缺少公共构造函数,这一事实是一个积极的信号。同样,您使用compile静态方法来创建包含Pattern。
So, in short, if you do something like the example:
因此,简而言之,如果您执行类似示例的操作:
Pattern p = Pattern.compile("a*b");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();
you should be doing pretty well.
你应该做得很好。
Follow-up to the code example for clarity: note that this example strongly implies that the Matcher thus created is thread-local with the Pattern and the test. I.e., you should not expose the Matcher thus created to any other threads.
为清楚起见,跟进代码示例:请注意,此示例强烈暗示由此创建的 Matcher 是具有模式和测试的线程本地的。即,您不应将这样创建的 Matcher 暴露给任何其他线程。
Frankly, that's the risk of any thread-safety question. The reality is that anycode can be made thread-unsafe if you try hard enough. Fortunately, there are wonderfulbooksthat teach us a whole bunch of ways that we could ruin our code. If we stay away from those mistakes, we greatly reduce our own probability of threading problems.
坦率地说,这是任何线程安全问题的风险。现实情况是,如果您足够努力,任何代码都可能成为线程不安全的。幸运的是,有很多很棒的书教给我们一大堆破坏代码的方法。如果我们远离这些错误,我们就会大大降低自己出现线程问题的可能性。
回答by adatapost
Thread-safety with regular expressions in Java
SUMMARY:
The Java regular expression API has been designed to allow a single compiled pattern to be shared across multiple match operations.
You can safely call Pattern.matcher()on the same pattern from different threads and safely use the matchers concurrently. Pattern.matcher()is safe to construct matchers without synchronization. Although the method isn't synchronized, internal to the Pattern class, a volatile variable called compiled is always set after constructing a pattern and read at the start of the call to matcher().This forces any thread referring to the Pattern to correctly "see" the contents of that object.
On the other hand, you shouldn't share a Matcher between different threads. Or at least, if you ever did, you should use explicit synchronization.
概括:
Java 正则表达式 API 旨在允许在多个匹配操作之间共享单个编译模式。
您可以安全地从不同线程对同一模式调用 Pattern.matcher()并安全地同时使用匹配器。 Pattern.matcher()在没有同步的情况下构造匹配器是安全的。尽管该方法不是同步的,但在 Pattern 类内部,始终会在构造模式后设置一个称为已编译的 volatile 变量,并在调用matcher()开始时读取。这会强制任何引用 Pattern 的线程正确“查看”该对象的内容。
另一方面,您不应该在不同线程之间共享 Matcher。或者至少,如果你曾经这样做过,你应该使用显式同步。
回答by akf
A quick look at the code for Matcher.java
shows a bunch of member variables including the text that is being matched, arrays for groups, a few indexes for maintain location and a few boolean
s for other state. This all points to a stateful Matcher
that would not behave well if accessed by multiple Threads
. So does the JavaDoc:
快速浏览一下代码,Matcher.java
显示了一堆成员变量,包括正在匹配的文本、用于组的数组、一些用于维护位置的索引和一些boolean
用于其他状态的 s。这一切都指向一个有状态的Matcher
,如果被多个Threads
. 如此做的JavaDoc:
Instances of this class are not safe for use by multiple concurrent threads.
此类的实例对于多个并发线程使用是不安全的。
This is only an issue if, as @Bob Cross points out, you go out of your way to allow use of your Matcher
in separate Thread
s. If you need to do this, and you think that synchronization will be an issue for your code, an option you have is to use a ThreadLocal
storage object to maintain a Matcher
per working thread.
正如@Bob Cross 指出的那样,这只是一个问题,如果您不遗余力地允许Matcher
在单独的Thread
s 中使用。如果您需要这样做,并且您认为同步将是您的代码的一个问题,那么您可以选择使用ThreadLocal
存储对象来维护Matcher
每个工作线程。
回答by George Birbilis
To sum up, you can reuse (keep in static variables) the compiled Pattern(s) and tell them to give you new Matchers when needed to validate those regex pattens against some string
总而言之,您可以重用(保留在静态变量中)已编译的 Pattern(s) 并告诉他们在需要时为您提供新的匹配器以针对某个字符串验证这些正则表达式模式
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Validation helpers
*/
public final class Validators {
private static final String EMAIL_PATTERN = "^[_A-Za-z0-9-]+(\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})$";
private static Pattern email_pattern;
static {
email_pattern = Pattern.compile(EMAIL_PATTERN);
}
/**
* Check if e-mail is valid
*/
public static boolean isValidEmail(String email) {
Matcher matcher = email_pattern.matcher(email);
return matcher.matches();
}
}
see http://zoomicon.wordpress.com/2012/06/01/validating-e-mails-using-regular-expressions-in-java/(near the end) regarding the RegEx pattern used above for validating e-mails (in case it doesn't fit ones needs for e-mail validation as it is posted here)
有关上面用于验证电子邮件的 RegEx 模式,请参阅http://zoomicon.wordpress.com/2012/06/01/validating-e-mails-using-regular-expressions-in-java/(接近尾声)(以防它不适合电子邮件验证的需要,因为它发布在此处)