eclipse 将已知编码的文件转换为 UTF-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4383504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert File with known encoding to UTF-8
提问by HymanBauer
I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse). Looking for the example or how to do that but still can not figure out...need your help!
我需要将文本文件转换为字符串,最后,我应该将其作为输入参数(类型 InputStream)放入 IFile.create (Eclipse)。正在寻找示例或如何执行此操作,但仍然无法弄清楚...需要您的帮助!
just for testing, I did try to convert original text file to UTF-8 encoded with this code
只是为了测试,我确实尝试将原始文本文件转换为使用此代码编码的 UTF-8
FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);
Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();
int ch;
while ((ch = in.read()) > -1) {
buffer.append((char)ch);
}
in.close();
FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();
but even thought the final *.test.txt file has UTF-8 encoding, the characters inside are corrupted.
但即使认为最终的 *.test.txt 文件具有 UTF-8 编码,里面的字符也已损坏。
回答by Matt Ball
You need to specify the encoding of the InputStreamReader
using the Charset
parameter.
您需要指定InputStreamReader
usingCharset
参数的编码。
// ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));
This also works:
这也有效:
InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));
See also:
也可以看看:
InputStreamReader(InputStream in, Charset cs)
Charset.forName(String charsetName)
- Java: How to determine the correct charset encoding of a stream
- How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII
- GuessEncoding- only works for UTF-8, UTF-16LE, UTF-16BE, and UTF-32 ?
- ICU Charset Detector
- cpdetector, free java codepage detection
- JCharDet(Java port of Mozilla charset detector) ironically, that page does not render the apostrophe in "Mozilla's" correctly
InputStreamReader(InputStream in, Charset cs)
Charset.forName(String charsetName)
- Java:如何确定流的正确字符集编码
- 如何可靠地猜测 MacRoman、CP1252、Latin1、UTF-8 和 ASCII 之间的编码
- GuessEncoding- 仅适用于 UTF-8、UTF-16LE、UTF-16BE 和 UTF-32 ?
- ICU 字符集检测器
- cpdetector,免费的java代码页检测
- JCharDet(Mozilla 字符集检测器的 Java 端口)具有讽刺意味的是,该页面没有正确呈现“Mozilla's”中的撇号
SO search where I found all these links: https://stackoverflow.com/search?q=java+detect+encoding
所以搜索我找到所有这些链接的地方:https: //stackoverflow.com/search?q=java+detect+encoding
You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset()
.
您可以在运行时通过Charset.defaultCharset()
.