Java 如何检查字符串是否仅包含 ASCII?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3585053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to check if a String contains only ASCII?
提问by TambourineMan
The call Character.isLetter(c)
returns true
if the character is a letter. But is there a way to quickly find if a String
only contains the base characters of ASCII?
如果字符是字母,则调用Character.isLetter(c)
返回true
。但是有没有办法快速找到 a 是否String
只包含 ASCII 的基本字符?
采纳答案by ColinD
From Guava19.0 onward, you may use:
从Guava19.0 开始,您可以使用:
boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);
This uses the matchesAllOf(someString)
method which relies on the factory method ascii()
rather than the now deprecated ASCII
singleton.
这使用matchesAllOf(someString)
依赖于工厂方法ascii()
而不是现在已弃用的ASCII
单例的方法。
Here ASCII includes all ASCII characters includingthe non-printable characters lower than 0x20
(space) such as tabs, line-feed / return but also BEL
with code 0x07
and DEL
with code 0x7F
.
这里的 ASCII 包括所有 ASCII 字符,包括低于0x20
(空格)的不可打印字符,例如制表符、换行符/回车符BEL
以及代码0x07
和DEL
代码0x7F
。
This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000
or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.
此代码错误地使用字符而不是代码点,即使代码点在早期版本的注释中指出。幸运的是,创建值为U+010000
或 以上的代码点所需的字符使用了两个值在 ASCII 范围之外的代理字符。因此该方法在测试 ASCII 时仍然成功,即使对于包含表情符号的字符串也是如此。
For earlier Guava versions without the ascii()
method you may write:
对于没有该ascii()
方法的早期番石榴版本,您可以编写:
boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);
回答by Thorbj?rn Ravn Andersen
Iterate through the string, and use charAt() to get the char. Then treat it as an int, and see if it has a unicode value (a superset of ASCII) which you like.
遍历字符串,并使用 charAt() 获取字符。然后把它当作一个整数,看看它是否有你喜欢的 unicode 值(ASCII 的超集)。
Break at the first you don't like.
一开始就打破你不喜欢的。
回答by RealHowTo
You can do it with java.nio.charset.Charset.
你可以用java.nio.charset.Charset做到这一点 。
import java.nio.charset.Charset;
public class StringUtils {
public static boolean isPureAscii(String v) {
return Charset.forName("US-ASCII").newEncoder().canEncode(v);
// or "ISO-8859-1" for ISO Latin 1
// or StandardCharsets.US_ASCII with JDK1.7+
}
public static void main (String args[])
throws Exception {
String test = "Réal";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
test = "Real";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
/*
* output :
* Réal isPureAscii() : false
* Real isPureAscii() : true
*/
}
}
回答by Arne Deutsch
Here is another way not depending on a library but using a regex.
这是另一种不依赖于库而是使用正则表达式的方法。
You can use this single line:
您可以使用这一行:
text.matches("\A\p{ASCII}*\z")
Whole example program:
整个示例程序:
public class Main {
public static void main(String[] args) {
char nonAscii = 0x00FF;
String asciiText = "Hello";
String nonAsciiText = "Buy: " + nonAscii;
System.out.println(asciiText.matches("\A\p{ASCII}*\z"));
System.out.println(nonAsciiText.matches("\A\p{ASCII}*\z"));
}
}
回答by JeremyP
Iterate through the string and make sure all the characters have a value less than 128.
遍历字符串并确保所有字符的值都小于 128。
Java Strings are conceptually encoded as UTF-16. In UTF-16, the ASCII character set is encoded as the values 0 - 127 and the encoding for any non ASCII character (which may consist of more than one Java char) is guaranteed not to include the numbers 0 - 127
Java 字符串在概念上编码为 UTF-16。在 UTF-16 中,ASCII 字符集被编码为值 0 - 127,并且任何非 ASCII 字符(可能包含多个 Java 字符)的编码保证不包括数字 0 - 127
回答by Zarathustra
Or you copy the code from the IDNclass.
或者您从IDN类中复制代码。
// to check if a string only contains US-ASCII code point
//
private static boolean isAllASCII(String input) {
boolean isASCII = true;
for (int i = 0; i < input.length(); i++) {
int c = input.charAt(i);
if (c > 0x7F) {
isASCII = false;
break;
}
}
return isASCII;
}
回答by pforyogurt
try this:
尝试这个:
for (char c: string.toCharArray()){
if (((int)c)>127){
return false;
}
}
return true;
回答by user3614583
It was possible. Pretty problem.
这是可能的。漂亮的问题。
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
public class EncodingTest {
static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII")
.newEncoder();
public static void main(String[] args) {
String testStr = "¤Eàs?W°ê?ú?i?T¤¤¤?3?ó?i?T?U2~~KITEC 3/F Rotunda 2";
String[] strArr = testStr.split("~~", 2);
int count = 0;
boolean encodeFlag = false;
do {
encodeFlag = asciiEncoderTest(strArr[count]);
System.out.println(encodeFlag);
count++;
} while (count < strArr.length);
}
public static boolean asciiEncoderTest(String test) {
boolean encodeFlag = false;
try {
encodeFlag = asciiEncoder.canEncode(new String(test
.getBytes("ISO8859_1"), "BIG5"));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return encodeFlag;
}
}
回答by Lukas Greblikas
//return is uppercase or lowercase
public boolean isASCIILetter(char c) {
return (c > 64 && c < 91) || (c > 96 && c < 123);
}
回答by fjkjava
commons-lang3 from Apache contains valuable utility/convenience methods for all kinds of 'problems', including this one.
来自 Apache 的 commons-lang3 包含针对各种“问题”的有价值的实用/便捷方法,包括这个问题。
System.out.println(StringUtils.isAsciiPrintable("!@£$%^&!@£$%^"));