在 Java 中检查字符串是否为 ISO 语言的 ISO 国家的更简洁方法

Question

提问by mat_boy

Suppose to have a two-characters String, which should represent the ISO 639country or language name.

假设有一个由两个字符组成的String，它应该代表ISO 639国家或语言名称。

You know, Localeclass has two functions getISOLanguagesand getISOCountriesthat return an array of Stringwith all the ISO languages and ISO countries, respectively.

您知道，Localeclass 有两个函数getISOLanguages，getISOCountries它们分别返回String包含所有 ISO 语言和 ISO 国家/地区的数组。

To check if a specific Stringobject is a valid ISO language or ISO country I should look inside that arrays for a matching String. Ok, I can do that by using a binary search (e.g. Arrays.binarySearchor the ApacheCommons ArrayUtils.contains).

要检查特定String对象是否是有效的 ISO 语言或 ISO 国家/地区，我应该在该数组中查找匹配的String. 好的，我可以通过使用二进制搜索（例如Arrays.binarySearch或 ApacheCommons ArrayUtils.contains）来做到这一点。

The question is: exists any utility(e.g. from Guavaor Apache Commonslibraries) that provides a cleaner way, e.g. a function that returns a booleanto validate a Stringas a valid ISO 639 language or ISO 639 Country?

问题是：是否存在提供更简洁方式的任何实用程序（例如来自Guava或Apache Commons库），例如返回 aboolean以将 a验证String为有效 ISO 639 语言或 ISO 639 Country的函数？

For instance:

例如：

public static boolean isValidISOLanguage(String s)
public static boolean isValidISOCountry(String s)

Answer 1

回答by Jon Skeet

I wouldn't bother using either a binary search or any third party libraries - HashSetis fine for this:

我不会费心使用二进制搜索或任何第三方库 -HashSet对此很好：

public final class IsoUtil {
    private static final Set<String> ISO_LANGUAGES = new HashSet<String>
        (Arrays.asList(Locale.getISOLanguages()));
    private static final Set<String> ISO_COUNTRIES = new HashSet<String>
        (Arrays.asList(Locale.getISOCountries()));

    private IsoUtil() {}

    public static boolean isValidISOLanguage(String s) {
        return ISO_LANGUAGES.contains(s);
    }

    public static boolean isValidISOCountry(String s) {
        return ISO_COUNTRIES.contains(s);
    }
}

You couldcheck for the string length first, but I'm not sure I'd bother - at least not unless you want to protect yourself against performance attacks where you're given enormous strings which would take a long time to hash.

您可以先检查字符串长度，但我不确定我会不会打扰 - 至少不会，除非您想保护自己免受性能攻击，在这种情况下，您会收到需要很长时间散列的大量字符串。

EDIT: If you dowant to use a 3rd party library, ICU4Jis the most likely contender - but that may well have a more up-to-date list than the ones supported by Locale, so you would want to move to use ICU4J everywhere, probably.

编辑：如果您确实想使用 3rd 方库，ICU4J是最有可能的竞争者 - 但它的列表很可能比支持的列表更新Locale，因此您可能希望在任何地方都使用 ICU4J，大概。

Answer 2

回答by Sergey Ponomarev

As far I know there is no any such method in any library but at least you can declare it yourself like:

据我所知，任何库中都没有任何此类方法，但至少您可以像这样自己声明：

import static java.util.Arrays.binarySearch;
import java.util.Locale;

/**
 * Validator of country code.
 * Uses binary search over array of sorted country codes.
 * Country code has two ASCII letters so we need at least two bytes to represent the code.
 * Two bytes are represented in Java by short type. This is useful for us because we can use Arrays.binarySearch(short[] a, short needle)
 * Each country code is converted to short via countryCodeNeedle() function.
 *
 * Average speed of the method is 246.058 ops/ms which is twice slower than lookup over HashSet (523.678 ops/ms).
 * Complexity is O(log(N)) instead of O(1) for HashSet.
 * But it consumes only 520 bytes of RAM to keep the list of country codes instead of 22064 (> 21 Kb) to hold HashSet of country codes.
 */
public class CountryValidator {
  /** Sorted array of country codes converted to short */
  private static final short[] COUNTRIES_SHORT = initShortArray(Locale.getISOCountries());

  public static boolean isValidCountryCode(String countryCode) {
    if (countryCode == null || countryCode.length() != 2 || countryCodeIsNotAlphaUppercase(countryCode)) {
      return false;
    }
    short needle = countryCodeNeedle(countryCode);
    return binarySearch(COUNTRIES_SHORT, needle) >= 0;
  }

  private static boolean countryCodeIsNotAlphaUppercase(String countryCode) {
    char c1 = countryCode.charAt(0);
    if (c1 < 'A' || c1 > 'Z') {
      return true;
    }
    char c2 = countryCode.charAt(1);
    return c2 < 'A' || c2 > 'Z';
  }

  /**
   * Country code has two ASCII letters so we need at least two bytes to represent the code.
   * Two bytes are represented in Java by short type. So we should convert two bytes of country code to short.
   * We can use something like:
   * short val = (short)((hi << 8) | lo);
   * But in fact very similar logic is done inside of String.hashCode() function.
   * And what is even more important is that each string object already has cached hash code.
   * So for us the conversion of two letter country code to short can be immediately.
   * We can relay on String's hash code because it's specified in JLS
   **/
  private static short countryCodeNeedle(String countryCode) {
    return (short) countryCode.hashCode();
  }

  private static short[] initShortArray(String[] isoCountries) {
    short[] countriesShortArray = new short[isoCountries.length];
    for (int i = 0; i < isoCountries.length; i++) {
      String isoCountry = isoCountries[i];
      countriesShortArray[i] = countryCodeNeedle(isoCountry);
    }
    return countriesShortArray;
  }
}

The Locale.getISOCountries()will always create a new array so we should store it into a static field to avoid non necessary allocations. In the same time HashSetor TreeSetconsumes a lot of memory so this validator will use a binary search on array. This is a trade off between speed and memory.

该Locale.getISOCountries()过程将创建一个新的数组，所以我们应该把它存储到静态字段，以避免非必要的分配。同时HashSet或TreeSet消耗大量内存，因此此验证器将对数组使用二进制搜索。这是速度和内存之间的权衡。

在 Java 中检查字符串是否为 ISO 语言的 ISO 国家的更简洁方法

提问by mat_boy

回答by Jon Skeet

回答by Sergey Ponomarev

相关推荐

最近更新

标签

在 Java 中检查字符串是否为 ISO 语言的 ISO 国家的更简洁方法

提问by mat_boy

回答by Jon Skeet

回答by Sergey Ponomarev

相关推荐

java 列出一个包的所有子包

在 Java 6 中编译，在 7 中运行 - 如何指定 useLegacyMergeSort？

java 找不到属性文件 - 如何将其定位为资源？

java 在java中使用正则表达式从字符串中提取数字

相关推荐

最近更新

标签