java 在java中检测汉字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26357938/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Detect Chinese character in java
提问by Ran Deloun
Using Java how to detect if a String contains Chinese characters?
使用Java如何检测一个字符串是否包含汉字?
String chineseStr = "已下架" ;
if (isChineseString(chineseStr)) {
System.out.println("The string contains Chinese characters");
}else{
System.out.println("The string contains Chinese characters");
}
Can you please help me to solve the problem?
你能帮我解决这个问题吗?
回答by Joop Eggen
Now Character.isIdeographic(int codepoint)
would tell wether the codepoint is a CJKV (Chinese, Japanese, Korean and Vietnamese) ideograph.
现在Character.isIdeographic(int codepoint)
可以判断代码点是否是 CJKV(中文、日文、韩文和越南文)表意文字。
Nearer is using Character.UnicodeScript.HAN.
Nearer 正在使用 Character.UnicodeScript.HAN。
So:
所以:
System.out.println(containsHanScript("xxx已下架xxx"));
public static boolean containsHanScript(String s) {
for (int i = 0; i < s.length(); ) {
int codepoint = s.codePointAt(i);
i += Character.charCount(codepoint);
if (Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN) {
return true;
}
}
return false;
}
Or in java 8:
或者在 Java 8 中:
public static boolean containsHanScript(String s) {
return s.codePoints().anyMatch(
codepoint ->
Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN);
}
回答by Ruchira Gayan Ranaweera
You can try with Google APIor Language Detection API
Language Detection API contains simple demo. You can try it first.
语言检测 API 包含简单的演示。你可以先试试。
回答by ccpizza
A more literal approach:
更直接的方法:
if ("粽子".matches("[\u4E00-\u9FA5]+")) {
System.out.println("is Chinese");
}
If you also need to catch rarely used and exotic characters then you'll need to add all the ranges: What's the complete range for Chinese characters in Unicode?
如果您还需要捕获很少使用的和异国情调的字符,那么您需要添加所有范围:Unicode 中汉字的完整范围是多少?