Java 正则表达式中的重音
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5733304/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Accent in Regular Expression in Java
提问by Rafael
I'd like to use Hibernate Validator to validate some columns. The problem, as I understand, is that the \w marker in java doesn't accept letters with accents on them.
我想使用 Hibernate Validator 来验证一些列。据我了解,问题在于 java 中的 \w 标记不接受带有重音符号的字母。
Is there any way that I could write the regexp so that words like Relatório could be validated (i wouldn't want to write all letters with accents between brackets, because I expect to be writing this regexp in a lot of columns)?
有什么方法可以编写正则表达式,以便可以验证像 Relatório 这样的单词(我不想在括号之间写所有带重音的字母,因为我希望在很多列中编写这个正则表达式)?
回答by Rachel Shallit
The Java regex documentationhas a section on Unicode categories (search for "Classes for Unicode blocks and categories"). If you're just looking for letters, I think \p{L}
is the category you want.
在Java的正则表达式的文件对Unicode的类别(搜索“类对Unicode块和类别”)的部分。如果您只是在寻找字母,我认为这\p{L}
就是您想要的类别。
回答by Havnar
I had more luck with:
我有更多的运气:
\p{InCombiningDiacriticalMarks}+
In java I use the following method:
在java中,我使用以下方法:
import java.text.Normalizer;
import java.text.Normalizer.Form;
public static String removeAccents(String text) {
return text == null ? null :
Normalizer.normalize(text, Form.NFD)
.replaceAll("\p{InCombiningDiacriticalMarks}+", "");
}