java 如何在带有土耳其字符的java中读取UTF 8编码的文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16435525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 22:55:54  来源:igfitidea点击:

How to read UTF 8 encoded file in java with turkish characters

javautf-8

提问by Juned Ahsan

I am trying to read a UTF-8 encoded txt file, which has some turkish characters. Basically I am have written an axis based web service, which reads this file and send the output back as a string. Somehow I am not able to read the characters properly. The code is very simple as mentioned here:

我正在尝试读取一个 UTF-8 编码的 txt 文件,其中包含一些土耳其语字符。基本上我已经编写了一个基于轴的 Web 服务,它读取这个文件并将输出作为字符串发送回。不知何故,我无法正确读取字符。代码非常简单,正如这里提到的:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;

public class TurkishWebService {

    public String generateTurkishString() throws IOException {
        InputStream isr = this.getClass().getResourceAsStream(
                "/" + "turkish.txt");

        BufferedReader in = new BufferedReader(new InputStreamReader(isr,
                "UTF8"));
        String str;

        while ((str = in.readLine()) != null) {
            System.out.println(str);
        }

        in.close();
        return str;
    }

    public String normalString() {
        System.out.println("webService normal text");
        return "webService normal text";
    }

    public static void main(String args[]) throws IOException {
        new TurkishWebService().generateTurkishString();
    }
}

Here are the contents of turkish.txt, just one line

这里是turkish.txt的内容,就一行

Assal??????????

I am getting the stdout as

我得到的标准输出为

Assal?τ????÷÷??

Please suggest what am I doing wrong here.

请建议我在这里做错了什么。

回答by Arnaud

Make sure the console you use to display the output is also encoded in UTF-8. In Eclipse for example, you need to go to Run Configuration> Commonto do this.

确保用于显示输出的控制台也以 UTF-8 编码。例如,在 Eclipse 中,您需要转到Run Configuration>Common来执行此操作。

enter image description here

在此处输入图片说明

回答by McDowell

You appear to be correctly decoding the file data from UTF-8 to UTF-16 strings.

您似乎正确地将文件数据从 UTF-8 解码为 UTF-16 字符串。

System.outperforms transcoding operations from UTF-16 strings to the default JRE character encoding. If this does not match the encoding used by the device receiving the character data is corrupted. So, the console should be set to the default character encoding or data corruption occurs. How this is done is device-dependent.

System.out执行从 UTF-16 字符串到默认 JRE 字符编码的转码操作。如果这与接收字符数据的设备使用的编码不匹配,则会损坏。因此,控制台应设置为默认字符编码,否则会发生数据损坏。这是如何完成的取决于设备。

If you are using a terminal, the Consoledoes a better job of determining the device encoding.

如果您使用的是终端,则控制台可以更好地确定设备编码。

Note: it is better to use the try-with-resourcesor at least try-finallyto close streams; use the standard encoding constantsif available.

注意:最好使用try-with-resources或至少try-finally来关闭流;如果可用,请使用标准编码常量

回答by Evgeniy Dorofeev

Code looks good. The problem should be in console output that cannot print Turkish. To be sure make a temp test in your program: take the string with Assal?τ????÷÷?? that you read from file and do this

代码看起来不错。问题应该出在无法打印土耳其语的控制台输出中。确保在您的程序中进行临时测试:使用 Assal?τ????÷÷?? 取字符串 您从文件中读取并执行此操作

 System.out.println(str.charAt(6) == '?');