java 如何在带有土耳其字符的java中读取UTF 8编码的文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16435525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read UTF 8 encoded file in java with turkish characters
提问by Juned Ahsan
I am trying to read a UTF-8 encoded txt file, which has some turkish characters. Basically I am have written an axis based web service, which reads this file and send the output back as a string. Somehow I am not able to read the characters properly. The code is very simple as mentioned here:
我正在尝试读取一个 UTF-8 编码的 txt 文件,其中包含一些土耳其语字符。基本上我已经编写了一个基于轴的 Web 服务,它读取这个文件并将输出作为字符串发送回。不知何故,我无法正确读取字符。代码非常简单,正如这里提到的:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
public class TurkishWebService {
public String generateTurkishString() throws IOException {
InputStream isr = this.getClass().getResourceAsStream(
"/" + "turkish.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(isr,
"UTF8"));
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
in.close();
return str;
}
public String normalString() {
System.out.println("webService normal text");
return "webService normal text";
}
public static void main(String args[]) throws IOException {
new TurkishWebService().generateTurkishString();
}
}
Here are the contents of turkish.txt, just one line
这里是turkish.txt的内容,就一行
Assal??????????
I am getting the stdout as
我得到的标准输出为
Assal?τ????÷÷??
Please suggest what am I doing wrong here.
请建议我在这里做错了什么。
回答by Arnaud
Make sure the console you use to display the output is also encoded in UTF-8. In Eclipse for example, you need to go to Run Configuration
> Common
to do this.
确保用于显示输出的控制台也以 UTF-8 编码。例如,在 Eclipse 中,您需要转到Run Configuration
>Common
来执行此操作。
回答by McDowell
You appear to be correctly decoding the file data from UTF-8 to UTF-16 strings.
您似乎正确地将文件数据从 UTF-8 解码为 UTF-16 字符串。
System.out
performs transcoding operations from UTF-16 strings to the default JRE character encoding. If this does not match the encoding used by the device receiving the character data is corrupted. So, the console should be set to the default character encoding or data corruption occurs. How this is done is device-dependent.
System.out
执行从 UTF-16 字符串到默认 JRE 字符编码的转码操作。如果这与接收字符数据的设备使用的编码不匹配,则会损坏。因此,控制台应设置为默认字符编码,否则会发生数据损坏。这是如何完成的取决于设备。
If you are using a terminal, the Consoledoes a better job of determining the device encoding.
如果您使用的是终端,则控制台可以更好地确定设备编码。
Note: it is better to use the try-with-resourcesor at least try-finallyto close streams; use the standard encoding constantsif available.
注意:最好使用try-with-resources或至少try-finally来关闭流;如果可用,请使用标准编码常量。
回答by Evgeniy Dorofeev
Code looks good. The problem should be in console output that cannot print Turkish. To be sure make a temp test in your program: take the string with Assal?τ????÷÷?? that you read from file and do this
代码看起来不错。问题应该出在无法打印土耳其语的控制台输出中。确保在您的程序中进行临时测试:使用 Assal?τ????÷÷?? 取字符串 您从文件中读取并执行此操作
System.out.println(str.charAt(6) == '?');