如何确定字符是否是Java中的字母?

时间:2020-03-06 14:22:09  来源:igfitidea点击:

如何检查一个字符的字符串是否为包含任何带有重音符号的字母的字母?

我最近不得不解决这个问题,因此在最近的VB6问题提醒我之后,我将自己回答。

解决方案

仅检查字母是否在A-Z中,因为它不包括带有重音符号的字母或者其他字母的字母。

我发现可以将正则表达式类用于" Unicode字母"或者其区分大小写的变体之一:

string.matches("\p{L}"); // Unicode letter
string.matches("\p{Lu}"); // Unicode upper-case letter

我们也可以使用Character类执行此操作:

Character.isLetter(character);

但这不方便,如果我们需要检查多个字母。

Character.isLetter()比string.matches()快得多,因为string.matches()每次都会编译一个新的Pattern。即使缓存模式,我认为isLetter()仍然会胜过它。

编辑:再次遇到这个问题,以为我会尝试得出一些实际数字。这是我尝试进行基准测试的方法,请检查所有三种方法(带有和不缓存"模式"的" matches()"和" Character.isLetter()")。我还确保同时检查了有效字符和无效字符,以免使内容歪斜。

import java.util.regex.*;

class TestLetter {
    private static final Pattern ONE_CHAR_PATTERN = Pattern.compile("\p{L}");
    private static final int NUM_TESTS = 10000000;

    public static void main(String[] args) {
        long start = System.nanoTime();
        int counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testMatches(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of Pattern.matches() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
        /*********************************/
        start = System.nanoTime();
        counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testCharacter(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of isLetter() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
        /*********************************/
        start = System.nanoTime();
        counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testMatchesNoCache(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of String.matches() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
    }

    private static boolean testMatches(final String c) {
        return ONE_CHAR_PATTERN.matcher(c).matches();
    }
    private static boolean testMatchesNoCache(final String c) {
        return c.matches("\p{L}");
    }
    private static boolean testCharacter(final String c) {
        return Character.isLetter(c.charAt(0));
    }
}

和我的输出:

10000000 tests of Pattern.matches() took 4325146672 ns.
There were 4062500/10000000 valid characters
10000000 tests of isLetter() took 546031201 ns.
There were 4062500/10000000 valid characters
10000000 tests of String.matches() took 11900205444 ns.
There were 4062500/10000000 valid characters

因此,即使使用缓存的"模式",效果也几乎提高了8倍。 (而且,未缓存的缓存比缓存的缓存差近三倍。)