java Java文件编码从ANSI到UTF8的转换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15353671/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java file encoding conversion from ANSI to UTF8
提问by Ashish
I have a requirement to change the encoding of a file from ANSI(windows-1252) to UTF8. I wrote below program to do it through java. This program converts the characters to UTF8, but when I opened the file in notepad++ the encoding type was displayed as ANSI as UTF8. This gives me error when I import this file in access db. A file with UTF8 encoding only is desired. Also the requirement is to convert the file without opening it in any editor.
我需要将文件的编码从 ANSI(windows-1252) 更改为 UTF8。我写了下面的程序来通过java来完成。该程序将字符转换为 UTF8,但是当我在 notepad++ 中打开文件时,编码类型显示为 ANSI 作为 UTF8。当我在访问数据库中导入这个文件时,这给了我错误。只需要 UTF8 编码的文件。还要求转换文件而无需在任何编辑器中打开它。
public class ConvertFromAnsiToUtf8 {
private static final char BYTE_ORDER_MARK = '\uFEFF';
private static final String ANSI_CODE = "windows-1252";
private static final String UTF_CODE = "UTF8";
private static final Charset ANSI_CHARSET = Charset.forName(ANSI_CODE);
public static void main(String[] args) {
List<File> fileList;
File inputFolder = new File(args[0]);
if (!inputFolder.isDirectory()) {
return;
}
File parentDir = new File(inputFolder.getParent() + "\"
+ inputFolder.getName() + "_converted");
if (parentDir.exists()) {
return;
}
if (parentDir.mkdir()) {
} else {
return;
}
fileList = new ArrayList<File>();
for (final File fileEntry : inputFolder.listFiles()) {
fileList.add(fileEntry);
}
InputStream in;
Reader reader = null;
Writer writer = null;
try {
for (File file : fileList) {
in = new FileInputStream(file.getAbsoluteFile());
reader = new InputStreamReader(in, ANSI_CHARSET);
OutputStream out = new FileOutputStream(
parentDir.getAbsoluteFile() + "\"
+ file.getName());
writer = new OutputStreamWriter(out, UTF_CODE);
writer.write(BYTE_ORDER_MARK);
char[] buffer = new char[10];
int read;
while ((read = reader.read(buffer)) != -1) {
System.out.println(read);
writer.write(buffer, 0, read);
}
}
reader.close();
writer.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Any pointers will be helpful.
任何指针都会有所帮助。
Thanks, Ashish
谢谢,阿希什
回答by McDowell
The posted code correctly transcodes from windows-1252 to UTF-8.
发布的代码正确地从 windows-1252 转码为 UTF-8。
The Notepad++ message is confusing because "ANSI as UTF-8" has no obvious meaning; it appears to be an open defectin Notepad++. I believe Notepad++ means UTF-8 without BOM(see the encoding menu.)
Notepad++ 消息令人困惑,因为“ANSI as UTF-8”没有明显的含义;它似乎是Notepad ++ 中的一个开放缺陷。我相信 Notepad++ 意味着没有 BOM 的 UTF-8(请参阅编码菜单。)
Microsoft Access, being a Windows program, probably expects UTF-8 files to start with a byte-order-mark (BOM).
作为 Windows 程序的 Microsoft Access 可能希望 UTF-8 文件以字节顺序标记 ( BOM)开头。
You can inject a BOM into the document by writing the code point U+FEFF at the start of the file:
您可以通过在文件开头写入代码点 U+FEFF 将 BOM 注入文档:
import java.io.*;
import java.nio.charset.*;
public class Ansi1252ToUtf8 {
private static final char BYTE_ORDER_MARK = '\uFEFF';
public static void main(String[] args) throws IOException {
Charset windows1252 = Charset.forName("windows-1252");
try (InputStream in = new FileInputStream(args[0]);
Reader reader = new InputStreamReader(in, windows1252);
OutputStream out = new FileOutputStream(args[1]);
Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8)) {
writer.write(BYTE_ORDER_MARK);
char[] buffer = new char[1024];
int read;
while ((read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
}
}
}
回答by ScaledLizard
On Windows 7 (64-Bit), running Java 8, I had to close every file. Otherwise, files get truncated to multiples of 4 kB. It is not enough to close the last set of files, I had to close every file to get the desired result. Posting my adapted version that adds error messages:
在运行 Java 8 的 Windows 7(64 位)上,我不得不关闭每个文件。否则,文件将被截断为 4 kB 的倍数。关闭最后一组文件是不够的,我必须关闭每个文件才能获得所需的结果。发布我添加错误消息的改编版本:
import java.io.*;
import java.nio.charset.*;
import java.util.ArrayList;
public class ConvertFromAnsiToUtf8 {
private static final char BYTE_ORDER_MARK = '\uFEFF';
private static final String ANSI_CODE = "windows-1252";
private static final String UTF_CODE = "UTF8";
private static final Charset ANSI_CHARSET = Charset.forName(ANSI_CODE);
private static final String PATH_SEP = "\";
private static final boolean WRITE_BOM = false;
public static void main(String[] args)
{
if (args.length != 2) {
System.out.println("Please name a source and a target directory");
return;
}
File inputFolder = new File(args[0]);
if (!inputFolder.isDirectory()) {
System.out.println("Input folder " + inputFolder + " does not exist");
return;
}
File outputFolder = new File(args[1]);
if (outputFolder.exists()) {
System.out.println("Folder " + outputFolder + " exists - aborting");
return;
}
if (outputFolder.mkdir()) {
System.out.println("Placing converted files in " + outputFolder);
} else {
System.out.println("Output folder " + outputFolder + " exists - aborting");
return;
}
ArrayList<File> fileList = new ArrayList<File>();
for (final File fileEntry : inputFolder.listFiles()) {
fileList.add(fileEntry);
}
InputStream in;
Reader reader = null;
Writer writer = null;
int converted = 0;
try {
for (File file : fileList) {
try {
in = new FileInputStream(file.getAbsoluteFile());
reader = new InputStreamReader(in, ANSI_CHARSET);
OutputStream out = new FileOutputStream(outputFolder.getAbsoluteFile() + PATH_SEP + file.getName());
writer = new OutputStreamWriter(out, UTF_CODE);
if (WRITE_BOM)
writer.write(BYTE_ORDER_MARK);
char[] buffer = new char[1024];
int read;
while ((read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
++converted;
} finally {
reader.close();
writer.close();
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(converted + " files converted");
}
}