为什么 Java BufferedReader() 不能正确读取阿拉伯语和中文字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2260325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is Java BufferedReader() not reading Arabic and Chinese characters correctly?
提问by M. A. Kishawy
I'm trying to read a file which contain English & Arabic characters on each line and another file which contains English & Chinese characters on each line. However the characters of the Arabic and Chinese fail to show correctly - they just appear as question marks. Any idea how I can solve this problem?
我正在尝试读取每行包含英文和阿拉伯字符的文件和每行包含英文和中文字符的另一个文件。然而,阿拉伯文和中文的字符未能正确显示——它们只是以问号的形式出现。知道如何解决这个问题吗?
Here is the code I use for reading:
这是我用于阅读的代码:
try {
String sCurrentLine;
BufferedReader br = new BufferedReader(new FileReader(directionOfTargetFile));
int counter = 0;
while ((sCurrentLine = br.readLine()) != null) {
String lineFixedHolder = converter.fixParsedParagraph(sCurrentLine);
System.out.println("The line number "+ counter
+ " contain : " + sCurrentLine);
counter++;
}
}
Edition 01
01版
After reading the line and getting the Arabic and Chinese word I use a function to translate them by simply searching for Given Arabic Textin an ArrayList (which contain all expected words) (using indexOf(); method). Then when the word's index is found it's used to call the English word which has the same index in another Arraylist. However this search always returns false because it fails when searching the question marks instead of the Arabic and Chinese characters. So my System.out.println print shows me nulls, one for each failure to translate.
阅读该行并获取阿拉伯语和中文单词后,我使用一个函数来翻译它们,只需在ArrayList(包含所有预期单词)中搜索Given Arab Text(使用 indexOf(); 方法)。然后当找到单词的索引时,它被用来调用在另一个 Arraylist 中具有相同索引的英文单词。但是,此搜索始终返回 false,因为它在搜索问号而不是阿拉伯语和中文字符时失败。所以我的 System.out.println 打印显示了空值,每次翻译失败都会显示一个空值。
*I'm using Netbeans 6.8 Mac version IDE
*我使用的是 Netbeans 6.8 Mac 版 IDE
Edition 02
02版
Here is the code which search for translation:
这是搜索翻译的代码:
int testColor = dbColorArb.indexOf(wordToTranslate);
int testBrand = -1;
if ( testColor != -1 ) {
String result = (String)dbColorEng.get(testColor);
return result;
} else {
testBrand = dbBrandArb.indexOf(wordToTranslate);
}
//System.out.println ("The testBrand is : " + testBrand);
if ( testBrand != -1 ) {
String result = (String)dbBrandEng.get(testBrand);
return result;
} else {
//System.out.println ("The first null");
return null;
}
I'm actually searching 2 Arraylists which might contain the the desired word to translate. If it fails to find them in both ArrayLists, then null is returned.
我实际上正在搜索 2 个可能包含要翻译的单词的 Arraylists。如果在两个 ArrayList 中都找不到它们,则返回 null。
Edition 03
03版
When I debug I found that lines being read are stored in my String variable as the following:
当我调试时,我发现正在读取的行存储在我的 String 变量中,如下所示:
"3;0000000000;0000001001;1996-06-22;;2010-01-27;????;;01989;??????;"
Edition 03
03版
The file I'm reading has been given to me after it has been modified by another program (which I know nothing about beside it's made in VB) the program made the Arabic letters that are not appearing correctly to appear. When I checked the encoding of the file on Notepad++ it showed that it's ANSI. however when I convert it to UTF8 (which replaced the Arabic letter with other English one) and then convert it back to ANSI the Arabic become question marks!
我正在阅读的文件在被另一个程序修改后给了我(除了它是在 VB 中制作的,我对此一无所知)该程序使阿拉伯字母出现不正确。当我在 Notepad++ 上检查文件的编码时,它显示它是 ANSI。但是,当我将其转换为 UTF8(用其他英文字母替换阿拉伯字母)然后将其转换回 ANSI 时,阿拉伯语变成了问号!
采纳答案by Bozho
Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
读取字符文件的便利类。此类的构造函数假定默认字符编码和默认字节缓冲区大小是合适的。要自己指定这些值,请在 FileInputStream 上构造 InputStreamReader。
So:
所以:
Reader reader = new InputStreamReader(new FileInputStream(fileName), "utf-8");
BufferedReader br = new BufferedReader(reader);
If this still doesn't work, then perhaps your console is not set to properly display UTF-8 characters. Configuration depends on the IDE used and is rather simple.
如果这仍然不起作用,那么您的控制台可能没有设置为正确显示 UTF-8 字符。配置取决于所使用的 IDE,而且相当简单。
Update :In the above code replace utf-8
with cp1256
. This works fine for me (WinXP, JDK6)
更新:在上面的代码中替换utf-8
为cp1256
. 这对我来说很好用(WinXP,JDK6)
But I'd recommend that you insist on the file being generated using UTF-8. Because cp1256
won't work for Chinese and you'll have similar problems again.
但我建议您坚持使用 UTF-8 生成文件。因为cp1256
对中文不起作用,你会再次遇到类似的问题。
回答by Paul Wagland
IT is most likely Reading the information in correctly, however your output stream is probably not UTF-8, and so any character that cannot be shown in your output character set is being replaced with the '?'.
它最有可能正确读取信息,但是您的输出流可能不是 UTF-8,因此任何无法在您的输出字符集中显示的字符都将被替换为“?”。
You can confirm this by getting each character out and printing the character ordinal.
您可以通过取出每个字符并打印字符序数来确认这一点。
回答by Ahmad Alhaj Hussein
public void writeTiFile(String fileName,String str){
try {
FileOutputStream out = new FileOutputStream(fileName);
out.write(str.getBytes("windows-1256"));
} catch (Exception ex) {
ex.printStackTrace();
}
}