如何在 Java 中匹配 unicode 字符

Question

提问by ankimal

I m trying to match unicode characters in Java.

我正在尝试匹配 Java 中的 unicode 字符。

Input String: informa

输入字符串： informa

String to match : informátion

要匹配的字符串： informátion

So far I ve tried this:

到目前为止，我已经尝试过这个：

Pattern p= Pattern.compile("informa[\u0000-\uffff].*", (Pattern.UNICODE_CASE|Pattern.CANON_EQ|Pattern.CASE_INSENSITIVE));
    String s = "informátion";
    Matcher m = p.matcher(s);
    if(m.matches()){
        System.out.println("Match!");
    }else{
        System.out.println("No match");
    }

It comes out as "No match". Any ideas?

结果显示为“不匹配”。有任何想法吗？

Answer 1

回答by BalusC

The term "Unicode characters" is not specific enough. It would match everycharacter which is in the Unicode range, thus also "normal" characters. This term is however very often used when one actuallymeans "characters which are not in the printable ASCII range".

术语“Unicode 字符”不够具体。它将匹配Unicode 范围内的每个字符，因此也匹配“正常”字符。然而，当人们实际上表示“不在可打印的 ASCII 范围内的字符”时，这个术语经常被使用。

In regex terms that would be [^\x20-\x7E].

在正则表达式中，这将是[^\x20-\x7E].

boolean containsNonPrintableASCIIChars = string.matches(".*[^\x20-\x7E].*");

Depending on what you'd like to do with this information, here are some useful follow-up answers:

根据您想对这些信息做什么，以下是一些有用的后续回答：

Answer 2

回答by Austin Fitzpatrick

Is it because informaisn't a substring of informátionat all?

是因为informa根本不是一个子串informátion吗？

How would your code work if you removed the last afrom informain your regex?

如果您a从informa正则表达式中删除最后一个，您的代码将如何工作？

Answer 3

回答by james.garriss

It sounds like you want to match letters while ignoring diacritical marks. If that's right, then normalize your strings to NFD form, strip out the diacritical marks, and then do your search.

听起来您想在忽略变音符号的同时匹配字母。如果这是对的，那么将您的字符串规范化为 NFD 形式，去掉变音符号，然后进行搜索。

String normalized = java.text.Normalizer.normalize(textToSearch, java.text.Normalizer.Form.NFD);
String withoutDiacritical = normalized.replaceAll("\p{InCombiningDiacriticalMarks}+", "");
// Search code goes here...

To learn more about NFD:

要了解有关 NFD 的更多信息：

如何在 Java 中匹配 unicode 字符

提问by ankimal

回答by BalusC

回答by Austin Fitzpatrick

回答by james.garriss

相关推荐

最近更新

标签

如何在 Java 中匹配 unicode 字符

提问by ankimal

回答by BalusC

回答by Austin Fitzpatrick

回答by james.garriss

相关推荐

java 是否存在这样的 JSON 字符串构建器？

java 中开窗

java “JSONArray 文本必须在 null 的字符 1 处以 '[' 开头”

java 为 Eclipse 插件导入 org.eclipse.core 包

相关推荐

最近更新

标签