Java中如何判断一个字符是否是字母?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/93976/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 08:13:23  来源:igfitidea点击:

How to determine whether a character is a letter in Java?

javaunicode

提问by Peter Hilton

How do you check if a one-character String is a letter - including any letters with accents?

你如何检查一个单字符的字符串是否是一个字母——包括任何带重音的字母?

I had to work this out recently, so I'll answer it myself, after the recent VB6 question reminded me.

我最近不得不解决这个问题,所以在最近的 VB6 问题提醒我之后,我会自己回答。

采纳答案by Peter Hilton

Just checking if a letter is in A-Z because that doesn't include letters with accents or letters in other alphabets.

只是检查一个字母是否在 AZ 中,因为这不包括带有重音符号的字母或其他字母表中的字母。

I found out that you can use the regular expression class for 'Unicode letter', or one of its case-sensitive variations:

我发现您可以将正则表达式类用于“Unicode 字母”或其区分大小写的变体之一:

string.matches("\p{L}"); // Unicode letter
string.matches("\p{Lu}"); // Unicode upper-case letter

You can also do this with Characterclass:

您也可以使用Character类执行此操作:

Character.isLetter(character);

but that is less convenient if you need to check more than one letter.

但是如果您需要检查不止一封信,那就不那么方便了。

回答by Michael Myers

Character.isLetter() is much faster than string.matches(), because string.matches() compiles a new Pattern every time. Even caching the pattern, I think isLetter() would still beat it.

Character.isLetter() 比 string.matches() 快得多,因为 string.matches() 每次都会编译一个新的 Pattern。即使缓存模式,我认为 isLetter() 仍然会击败它。



EDIT:Just ran across this again and thought I'd try to come up with some actual numbers. Here's my attempt at a benchmark, checking all three methods (matches()with and without caching the Pattern, and Character.isLetter()). I also made sure that there were both valid and invalid characters checked, so as not to skew things.

编辑:刚刚再次遇到这个问题,并认为我会尝试提出一些实际数字。这是我在基准测试中的尝试,检查所有三种方法(matches()有和没有缓存Pattern, 和Character.isLetter())。我还确保检查了有效和无效的字符,以免出现偏差。

import java.util.regex.*;

class TestLetter {
    private static final Pattern ONE_CHAR_PATTERN = Pattern.compile("\p{L}");
    private static final int NUM_TESTS = 10000000;

    public static void main(String[] args) {
        long start = System.nanoTime();
        int counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testMatches(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of Pattern.matches() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
        /*********************************/
        start = System.nanoTime();
        counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testCharacter(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of isLetter() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
        /*********************************/
        start = System.nanoTime();
        counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testMatchesNoCache(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of String.matches() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
    }

    private static boolean testMatches(final String c) {
        return ONE_CHAR_PATTERN.matcher(c).matches();
    }
    private static boolean testMatchesNoCache(final String c) {
        return c.matches("\p{L}");
    }
    private static boolean testCharacter(final String c) {
        return Character.isLetter(c.charAt(0));
    }
}

And my output:

我的输出:

10000000 tests of Pattern.matches() took 4325146672 ns.
There were 4062500/10000000 valid characters
10000000 tests of isLetter() took 546031201 ns.
There were 4062500/10000000 valid characters
10000000 tests of String.matches() took 11900205444 ns.
There were 4062500/10000000 valid characters

So that's almost 8x better, even with a cached Pattern. (And uncached is nearly 3x worse than cached.)

因此,即使使用缓存的Pattern. (并且未缓存的比缓存的差近 3 倍。)