Java 如何检查字符串是否仅包含 ASCII?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3585053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 02:11:20  来源:igfitidea点击:

How to check if a String contains only ASCII?

javastringcharacter-encodingascii

提问by TambourineMan

The call Character.isLetter(c)returns trueif the character is a letter. But is there a way to quickly find if a Stringonly contains the base characters of ASCII?

如果字符是字母,则调用Character.isLetter(c)返回true。但是有没有办法快速找到 a 是否String只包含 ASCII 的基本字符?

采纳答案by ColinD

From Guava19.0 onward, you may use:

Guava19.0 开始,您可以使用:

boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);

This uses the matchesAllOf(someString)method which relies on the factory method ascii()rather than the now deprecated ASCIIsingleton.

这使用matchesAllOf(someString)依赖于工厂方法ascii()而不是现在已弃用的ASCII单例的方法。

Here ASCII includes all ASCII characters includingthe non-printable characters lower than 0x20(space) such as tabs, line-feed / return but also BELwith code 0x07and DELwith code 0x7F.

这里的 ASCII 包括所有 ASCII 字符,包括低于0x20(空格)的不可打印字符,例如制表符、换行符/回车符BEL以及代码0x07DEL代码0x7F

This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.

此代码错误地使用字符而不是代码点,即使代码点在早期版本的注释中指出。幸运的是,创建值为U+010000或 以上的代码点所需的字符使用了两个值在 ASCII 范围之外的代理字符。因此该方法在测试 ASCII 时仍然成功,即使对于包含表情符号的字符串也是如此。

For earlier Guava versions without the ascii()method you may write:

对于没有该ascii()方法的早期番石榴版本,您可以编写:

boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);

回答by Thorbj?rn Ravn Andersen

Iterate through the string, and use charAt() to get the char. Then treat it as an int, and see if it has a unicode value (a superset of ASCII) which you like.

遍历字符串,并使用 charAt() 获取字符。然后把它当作一个整数,看看它是否有你喜欢的 unicode 值(ASCII 的超集)。

Break at the first you don't like.

一开始就打破你不喜欢的。

回答by RealHowTo

You can do it with java.nio.charset.Charset.

你可以用java.nio.charset.Charset做到这一点 。

import java.nio.charset.Charset;

public class StringUtils {

  public static boolean isPureAscii(String v) {
    return Charset.forName("US-ASCII").newEncoder().canEncode(v);
    // or "ISO-8859-1" for ISO Latin 1
    // or StandardCharsets.US_ASCII with JDK1.7+
  }

  public static void main (String args[])
    throws Exception {

     String test = "Réal";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
     test = "Real";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));

     /*
      * output :
      *   Réal isPureAscii() : false
      *   Real isPureAscii() : true
      */
  }
}

Detect non-ASCII character in a String

检测字符串中的非 ASCII 字符

回答by Arne Deutsch

Here is another way not depending on a library but using a regex.

这是另一种不依赖于库而是使用正则表达式的方法。

You can use this single line:

您可以使用这一行:

text.matches("\A\p{ASCII}*\z")

Whole example program:

整个示例程序:

public class Main {
    public static void main(String[] args) {
        char nonAscii = 0x00FF;
        String asciiText = "Hello";
        String nonAsciiText = "Buy: " + nonAscii;
        System.out.println(asciiText.matches("\A\p{ASCII}*\z"));
        System.out.println(nonAsciiText.matches("\A\p{ASCII}*\z"));
    }
}

回答by JeremyP

Iterate through the string and make sure all the characters have a value less than 128.

遍历字符串并确保所有字符的值都小于 128。

Java Strings are conceptually encoded as UTF-16. In UTF-16, the ASCII character set is encoded as the values 0 - 127 and the encoding for any non ASCII character (which may consist of more than one Java char) is guaranteed not to include the numbers 0 - 127

Java 字符串在概念上编码为 UTF-16。在 UTF-16 中,ASCII 字符集被编码为值 0 - 127,并且任何非 ASCII 字符(可能包含多个 Java 字符)的编码保证不包括数字 0 - 127

回答by Zarathustra

Or you copy the code from the IDNclass.

或者您从IDN类中复制代码。

// to check if a string only contains US-ASCII code point
//
private static boolean isAllASCII(String input) {
    boolean isASCII = true;
    for (int i = 0; i < input.length(); i++) {
        int c = input.charAt(i);
        if (c > 0x7F) {
            isASCII = false;
            break;
        }
    }
    return isASCII;
}

回答by pforyogurt

try this:

尝试这个:

for (char c: string.toCharArray()){
  if (((int)c)>127){
    return false;
  } 
}
return true;

回答by user3614583

It was possible. Pretty problem.

这是可能的。漂亮的问题。

import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;

public class EncodingTest {

    static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII")
            .newEncoder();

    public static void main(String[] args) {

        String testStr = "¤Eàs?W°ê?ú?i?T¤¤¤?3?ó?i?T?U2~~KITEC 3/F Rotunda 2";
        String[] strArr = testStr.split("~~", 2);
        int count = 0;
        boolean encodeFlag = false;

        do {
            encodeFlag = asciiEncoderTest(strArr[count]);
            System.out.println(encodeFlag);
            count++;
        } while (count < strArr.length);
    }

    public static boolean asciiEncoderTest(String test) {
        boolean encodeFlag = false;
        try {
            encodeFlag = asciiEncoder.canEncode(new String(test
                    .getBytes("ISO8859_1"), "BIG5"));
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        return encodeFlag;
    }
}

回答by Lukas Greblikas

//return is uppercase or lowercase
public boolean isASCIILetter(char c) {
  return (c > 64 && c < 91) || (c > 96 && c < 123);
}

回答by fjkjava

commons-lang3 from Apache contains valuable utility/convenience methods for all kinds of 'problems', including this one.

来自 Apache 的 commons-lang3 包含针对各种“问题”的有价值的实用/便捷方法,包括这个问题。

System.out.println(StringUtils.isAsciiPrintable("!@£$%^&!@£$%^"));