在java中获取字符值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4329275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
get char value in java
提问by Nick
How can I get the UTF8 code of a char in Java ? I have the char 'a' and I want the value 97 I have the char 'é' and I want the value 233
如何在 Java 中获取字符的 UTF8 代码?我有字符“a”,我想要值 97 我有字符“é”,我想要值 233
here is a table for more values
这是更多值的表格
I tried Character.getNumericValue(a)
but for a it gives me 10 and not 97, any idea why?
我试过了,Character.getNumericValue(a)
但它给了我 10 而不是 97,知道为什么吗?
This seems very basic but any help would be appreciated!
这似乎非常基本,但任何帮助将不胜感激!
采纳答案by Michael Borgwardt
char
is actually a numeric type containing the unicode value (UTF-16, to be exact - you need twochar
s to represent characters outside the BMP) of the character. You can do everything with it that you can do with an int
.
char
实际上是包含字符的 unicode 值(确切地说是 UTF-16 - 您需要两个char
s 来表示 BMP 之外的字符)的数字类型。你可以用它做任何你可以用int
.
Character.getNumericValue()
tries to interpret the character as a digit.
Character.getNumericValue()
尝试将字符解释为数字。
回答by Robertas
This produces good result:
这产生了良好的结果:
int a = 'a';
System.out.println(a); // outputs 97
Likewise:
同样地:
System.out.println((int)'é');
prints out 233
.
打印出来233
。
Note that the first example only works for characters included in the standard and extended ASCII character sets. The second works with all Unicode characters. You can achieve the same result by multiplying the char by 1. System.out.println( 1 * 'é');
请注意,第一个示例仅适用于标准和扩展 ASCII 字符集中包含的字符。第二个适用于所有 Unicode 字符。您可以通过将字符乘以 1 来获得相同的结果。 System.out.println( 1 * 'é');
回答by Jon Skeet
Those "UTF-8" codes are no such thing. They're actually just Unicode values, as per the Unicode code charts.
那些“UTF-8”代码不是这样的。根据Unicode 代码图表,它们实际上只是 Unicode 值。
So an 'é' is actually U+00E9 - in UTF-8 it would be represented by two bytes { 0xc3, 0xa9 }.
所以一个“é”实际上是 U+00E9——在 UTF-8 中,它将由两个字节 {0xc3, 0xa9} 表示。
Now to get the Unicode value - or to be more precise the UTF-16 value, as that's what Java uses internally - you just need to convert the value to an integer:
现在要获取 Unicode 值 - 或者更准确地说是 UTF-16 值,因为这是 Java 内部使用的 - 您只需将该值转换为整数:
char c = '\u00e9'; // c is now e-acute
int i = c; // i is now 233
回答by Anon
Your question is unclear. Do you want the Unicode codepoint for a particular character (which is the example you gave), or do you want to translate a Unicode codepoint into a UTF-8 byte sequence?
你的问题不清楚。您想要特定字符的 Unicode 代码点(这是您给出的示例),还是要将 Unicode 代码点转换为 UTF-8 字节序列?
If the former, then I recommend the code charts at http://www.unicode.org/
如果是前者,那么我推荐http://www.unicode.org/ 上的代码图表
If the latter, then the following program will do it:
如果是后者,则以下程序将执行此操作:
public class Foo
{
public static void main(String[] argv)
throws Exception
{
char c = '\u00E9';
ByteArrayOutputStream bos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(bos, "UTF-8");
out.write(c);
out.flush();
byte[] bytes = bos.toByteArray();
for (int ii = 0 ; ii < bytes.length ; ii++)
System.out.println(bytes[ii] & 0xFF);
}
}
(there's also an online Unicode to UTF8 page, but I don't have the URL on this machine)
(网上也有Unicode转UTF8的页面,不过我这台机器上没有网址)
回答by Kaitsu
You can use the codePointAt(int index) method of java.lang.String for that. Here's an example:
为此,您可以使用 java.lang.String 的 codePointAt(int index) 方法。下面是一个例子:
"a".codePointAt(0) --> 97
"é".codePointAt(0) --> 233
If you want to avoid creating strings unnecessarily, the following works as well and can be used for char arrays:
如果您想避免不必要地创建字符串,以下方法也适用,可用于 char 数组:
Character.codePointAt(new char[] {'a'},0)
回答by Ksi?dz Pistolet
My method to do it is something like this:
我的方法是这样的:
char c = 'c';
int i = Character.codePointAt(String.valueOf(c), 0);
// testing
System.out.println(String.format("%c -> %d", c, i)); // c -> 99
回答by Michael Gantman
There is an open source library MgntUtils that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:
有一个开源库 MgntUtils,它有一个实用程序类 StringUnicodeEncoderDecoder。该类提供了将任何字符串转换为 Unicode 序列的静态方法,反之亦然。非常简单和有用。要转换字符串,您只需执行以下操作:
String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);
For example a String "Hello World" will be converted into
例如,字符串“Hello World”将被转换为
"\u0048\u0065\u006c\u006c\u006f\u0020 \u0057\u006f\u0072\u006c\u0064"
“\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064”
It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The article gives you link to Maven Central where you can get artifacts and github where you can get the project itself. The library comes with well written javadoc and source code.
它适用于任何语言。这是解释有关该库的所有详细信息的文章的链接:MgntUtils。查找副标题“字符串 Unicode 转换器”。本文提供了指向 Maven Central 的链接,您可以在其中获取工件和 github,您可以在其中获取项目本身。该库附带编写良好的 javadoc 和源代码。
回答by connelblaze
You can create a simple loop to list all the UTF-8 characters available like this:
您可以创建一个简单的循环来列出所有可用的 UTF-8 字符,如下所示:
public class UTF8Characters {
public static void main(String[] args) {
for (int i = 12; i <= 999; i++) {
System.out.println(i +" - "+ (char)i);
}
}
}