Java用\u读取unicode

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19047616/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 13:39:25  来源:igfitidea点击:

Java read unicode with \u

javatext

提问by Terrence

My java program is reading unicode from text file. e.g. \uffff.. View from the java GUI is no problem, but when i try to print out, all wording are overwritten, is it because of \u, or any other way to avoid the words overwritten?

我的 java 程序正在从文本文件中读取 unicode。例如\uffff.. 从java GUI 查看是没有问题的,但是当我尝试打印出来时,所有的措辞都被覆盖了,是因为\u还是其他任何方式来避免覆盖这些词?

sorry about my broken english.. thanks.

抱歉我的英语不好..谢谢。

采纳答案by Joop Eggen

The notation \uXXXXprimarily only occures in .javaand .propertiesfiles. There it is read as a Unicode code point. Unicode text (=using all kind of special characters) often uses the UTF-8 format (though also sometimes UTF16LE and UTF16BE are used).

该符号\uXXXX主要只出现在.java.properties文件中。在那里它被读取为 Unicode 代码点。Unicode 文本(=使用所有类型的特殊字符)通常使用 UTF-8 格式(尽管有时也会使用 UTF16LE 和 UTF16BE)。

This text is read as:

这段文字是这样读的:

BufferedReader in = new BufferedReader(
        new InputStreamReader(new FileInputStream(file), "UTF-8"));

And (for good order) written as

并且(为了良好的秩序)写成

new OutputStreamWriter(new FileOutputStream(file), "UTF-8")
new PrintWriter(file, "UTF-8")

Especially notwith FileReader and FileWriter which old utility classes use the platform encoding.

尤其是带的FileReader和FileWriter的这老实用工具类使用该平台的编码。

IF the text would countain \u20AC, that would be irregular, and would be printed literally (backslash, u, 20AC),

如果文本会计数\u20AC,那将是不规则的,并且会按字面打印(反斜杠,u,20AC),

Now if you mean there are problems with Unicode characters out of the normal ASCII range, like for the euro symbol , then it might be a matter of font, or a needed conversion, say to Windows Latin 1: "Windows-1252".

现在,如果您的意思是 Unicode 字符在正常 ASCII 范围之外存在问题,例如欧元符号,那么可能是字体问题或需要的转换,比如 Windows Latin 1: "Windows-1252"

回答by santu

As you already know, '\u' also known as Unicode escape is used to represent an international character. So as you can't enter that character from the keyboard itself, you need to use the unicode sequence to generate the character.

如您所知,'\u' 也称为 Unicode 转义符,用于表示国际字符。因此,由于您无法从键盘本身输入该字符,因此您需要使用 unicode 序列来生成该字符。

However, if such international characters are already there in a text file, so ofcourse you can read it. Java provides the class Charset, please refer the API at http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/charset/Charset.html

但是,如果文本文件中已经存在此类国际字符,那么您当然可以阅读它。Java 提供了class Charset,请参考http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/charset/Charset.html 上的 API

You should use Reader/Writer API in Java to deal with such characters. Because it supports 16 bit character which includes all the different languages other than Alphabets and ASCII. Where as InputStream/OutputStream do support only 8 bit character.

您应该使用 Java 中的 Reader/Writer API 来处理此类字符。因为它支持 16 位字符,其中包括除字母表和 ASCII 之外的所有不同语言。InputStream/OutputStream 只支持 8 位字符。

So to read such characters you can use:

因此,要阅读此类字符,您可以使用:

BufferedReader in = new BufferedReader(
        new InputStreamReader(new FileInputStream(file), "UTF-8"));

Here UTF-8 is the CharSet.

这里 UTF-8 是 CharSet。

Similarly you can print the data. But where you print, your editor (where you print the character) must support the unicode characters.

同样,您可以打印数据。但是在你打印的地方,你的编辑器(你打印字符的地方)必须支持 unicode 字符。

You can also refer the below link for some more replies from different people: Read unicode text files with java

您还可以参考以下链接以获取不同人的更多回复: Read unicode text files with java