在 Java 中读取带有重音字符的文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5844845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
reading file with accented characters in Java
提问by
I came across two special characters which seem not to be covered by the ISO-8859-1
character set i.e. they don't make it through to my program.
我遇到了两个特殊字符,它们似乎没有被ISO-8859-1
字符集覆盖,即它们没有进入我的程序。
The German ?
and the Norwegian ?
德国?
人和挪威人?
i'm reading the files as follows:
我正在阅读文件如下:
FileInputStream inputFile = new FileInputStream(corpus[i]);
InputStreamReader ir = new InputStreamReader(inputFile, "ISO-8859-1") ;
Is there a way for me to read these characters without having to apply manual replacement as a workaround?
有没有办法让我阅读这些字符而不必应用手动替换作为解决方法?
[EDIT]
[编辑]
this is how it looks on screen. Note that i have no problems with other accents e.g. è and the lot...
这就是它在屏幕上的样子。请注意,我对其他口音没有问题,例如 è 和很多...
采纳答案by Thorbj?rn Ravn Andersen
Both characters are present in ISO-Latin-1 (check my name to see why I've looked into this).
这两个字符都出现在 ISO-Latin-1 中(检查我的名字,看看我为什么研究这个)。
If the characters are not read in correctly, the most likely cause is that the text in the file is not saved in that encoding, but in something else.
如果未正确读取字符,最可能的原因是文件中的文本不是以该编码保存,而是以其他编码保存。
Depending on your operating system and the origin of the file, possible encodings could be UTF-8 or a Windows code page like 850 or 437.
根据您的操作系统和文件的来源,可能的编码可能是 UTF-8 或 Windows 代码页,如 850 或 437。
The easiest way is to look at the file with a hex editor and report back what exact values are saved for these two characters.
最简单的方法是使用十六进制编辑器查看文件并报告为这两个字符保存的确切值。
回答by WhiteFang34
回答by Matt Ball
ISO-8859-1 covers ? and ?, so the file is probably saved in a different encoding. You should pass in file's encoding to new InputStreamReader()
.
ISO-8859-1 涵盖 ? 和 ?,因此该文件可能以不同的编码保存。您应该将文件的编码传递给new InputStreamReader()
.