java:如何将文件转换为utf8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3018653/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 15:37:21  来源:igfitidea点击:

java: how to convert a file to utf8

javautf-8

提问by Enrique San Martín

i have a file that have some non-utf8 caracters (like "ISO-8859-1"), and so i want to convert that file (or read) to UTF8 encoding, how i can do it?

我有一个包含一些非 utf8 字符的文件(如“ISO-8859-1”),所以我想将该文件(或读取)转换为 UTF8 编码,我该怎么做?

The code it's like this:

代码是这样的:

File file = new File("some_file_with_non_utf8_characters.txt");

/* some code to convert the file to an utf8 file */

...

edit: Put an encoding example

编辑:放一个编码示例

采纳答案by leonbloy

  String charset = "ISO-8859-1"; // or what corresponds
  BufferedReader in = new BufferedReader( 
      new InputStreamReader (new FileInputStream(file), charset));
  String line;
  while( (line = in.readLine()) != null) { 
    ....
  }

There you have the text decoded. You can write it, by the simmetric Writer/OutputStream methods, with the encoding you prefer (eg UTF-8).

在那里你已经解码了文本。您可以通过 simmetric Writer/OutputStream 方法使用您喜欢的编码(例如 UTF-8)编写它。

回答by Ismael

You only want to read it as UTF-8? What I did recently given a similar problem is to start the JVM with -Dfile.encoding=UTF-8, and reading/printing as normal. I don't know if that is applicable in your case.

您只想将其阅读为 UTF-8?我最近在遇到类似问题时所做的是使用 -Dfile.encoding=UTF-8 启动 JVM,并正常读取/打印。我不知道这是否适用于您的情况。

With that option:

使用该选项:

System.out.println("á é í ó ú")

prints correctly the characters. Otherwise it prints a ? symbol

正确打印字符。否则它会打印一个 ? 象征

回答by ZZ Coder

You need to know the encoding of the input file. For example, if the file is in Latin-1, you would do something like this,

您需要知道输入文件的编码。例如,如果文件是 Latin-1,你会做这样的事情,

        FileInputStream fis = new FileInputStream("test.in");
        InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1");
        Reader in = new BufferedReader(isr);
        FileOutputStream fos = new FileOutputStream("test.out");
        OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
        Writer out = new BufferedWriter(osw);

        int ch;
        while ((ch = in.read()) > -1) {
            out.write(ch);
        }

        out.close();
        in.close();

回答by Eyal Schneider

The following code converts a file from srcEncoding to tgtEncoding:

以下代码将文件从 srcEncoding 转换为 tgtEncoding:

public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
    BufferedReader br = null;
    BufferedWriter bw = null;
    try{
        br = new BufferedReader(new InputStreamReader(new FileInputStream(source),srcEncoding));
        bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding));
        char[] buffer = new char[16384];
        int read;
        while ((read = br.read(buffer)) != -1)
            bw.write(buffer, 0, read);
    } finally {
        try {
            if (br != null)
                br.close();
        } finally {
            if (bw != null)
                bw.close();
        }
    }
}

--EDIT--

- 编辑 -

Using Try-with-resources (Java 7):

使用 Try-with-resources (Java 7):

public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
    try (
      BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(source), srcEncoding));
      BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); ) {
          char[] buffer = new char[16384];
          int read;
          while ((read = br.read(buffer)) != -1)
              bw.write(buffer, 0, read);
    } 
}