java 在java中检测汉字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26357938/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 09:47:44  来源:igfitidea点击:

Detect Chinese character in java

javaunicodeencodingutf-8

提问by Ran Deloun

Using Java how to detect if a String contains Chinese characters?

使用Java如何检测一个字符串是否包含汉字?

    String chineseStr = "已下架" ;

if (isChineseString(chineseStr)) {
  System.out.println("The string contains Chinese characters");
}else{
  System.out.println("The string contains Chinese characters");
}

Can you please help me to solve the problem?

你能帮我解决这个问题吗?

回答by Joop Eggen

Now Character.isIdeographic(int codepoint)would tell wether the codepoint is a CJKV (Chinese, Japanese, Korean and Vietnamese) ideograph.

现在Character.isIdeographic(int codepoint)可以判断代码点是否是 CJKV(中文、日文、韩文和越南文)表意文字。

Nearer is using Character.UnicodeScript.HAN.

Nearer 正在使用 Character.UnicodeScript.HAN。

So:

所以:

System.out.println(containsHanScript("xxx已下架xxx"));

public static boolean containsHanScript(String s) {
    for (int i = 0; i < s.length(); ) {
        int codepoint = s.codePointAt(i);
        i += Character.charCount(codepoint);
        if (Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN) {
            return true;
        }
    }
    return false;
}

Or in java 8:

或者在 Java 8 中:

public static boolean containsHanScript(String s) {
    return s.codePoints().anyMatch(
            codepoint ->
            Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN);
}

回答by Ruchira Gayan Ranaweera

You can try with Google APIor Language Detection API

您可以尝试使用 Google API语言检测 API

Language Detection API contains simple demo. You can try it first.

语言检测 API 包含简单的演示。你可以先试试。

回答by ccpizza

A more literal approach:

更直接的方法:

if ("粽子".matches("[\u4E00-\u9FA5]+")) {
    System.out.println("is Chinese");
}

If you also need to catch rarely used and exotic characters then you'll need to add all the ranges: What's the complete range for Chinese characters in Unicode?

如果您还需要捕获很少使用的和异国情调的字符,那么您需要添加所有范围:Unicode 中汉字的完整范围是多少?