Java 正则表达式中的重音

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5733304/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 12:31:36  来源:igfitidea点击:

Accent in Regular Expression in Java

javaregexhibernate-validator

提问by Rafael

I'd like to use Hibernate Validator to validate some columns. The problem, as I understand, is that the \w marker in java doesn't accept letters with accents on them.

我想使用 Hibernate Validator 来验证一些列。据我了解,问题在于 java 中的 \w 标记不接受带有重音符号的字母。

Is there any way that I could write the regexp so that words like Relatório could be validated (i wouldn't want to write all letters with accents between brackets, because I expect to be writing this regexp in a lot of columns)?

有什么方法可以编写正则表达式,以便可以验证像 Relatório 这样的单词(我不想在括号之间写所有带重音的字母,因为我希望在很多列中编写这个正则表达式)?

回答by Rachel Shallit

The Java regex documentationhas a section on Unicode categories (search for "Classes for Unicode blocks and categories"). If you're just looking for letters, I think \p{L}is the category you want.

Java的正则表达式的文件对Unicode的类别(搜索“类对Unicode块和类别”)的部分。如果您只是在寻找字母,我认为这\p{L}就是您想要的类别。

回答by Havnar

I had more luck with:

我有更多的运气:

\p{InCombiningDiacriticalMarks}+

In java I use the following method:

在java中,我使用以下方法:

import java.text.Normalizer;
import java.text.Normalizer.Form;

public static String removeAccents(String text) {
    return text == null ? null :
        Normalizer.normalize(text, Form.NFD)
            .replaceAll("\p{InCombiningDiacriticalMarks}+", "");
}