Java 读取 UTF-8 格式的 CSV 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19100448/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read a CSV file in UTF-8 format
提问by Ricardo
I am reading a csv file in java, adding a new column with new information and exporting it back to a CSV file. I have a problem in reading the CSV file in UTF-8 format. I read line by line and store it in a StringBuilder
, but when I print the line I can see that the information I'm reading is not in UTF-8 but in ANSI. I used both System.out.print
and printstream
in UTF and the information appears still in ANSI. This is my code :
我正在用 java 读取 csv 文件,添加一个包含新信息的新列并将其导出回 CSV 文件。我在读取 UTF-8 格式的 CSV 文件时遇到问题。我逐行阅读并将其存储在 a 中StringBuilder
,但是当我打印该行时,我可以看到我正在阅读的信息不是 UTF-8,而是 ANSI。我在 UTF 中同时使用System.out.print
和printstream
,但信息仍以 ANSI 显示。这是我的代码:
BufferedReader br;
try {
br = new BufferedReader(new InputStreamReader(new FileInputStream(
"./users.csv"), "UTF8"));
String line;
while ((line = br.readLine()) != null) {
if (line.contains("[email protected]")) {
continue;
}
if (!line.contains("@") && !line.contains("FirstName")) {
continue;
}
PrintStream ps = new PrintStream(System.out, true, "UTF-8");
ps.print(line + "\n");
sbusers.append(line);
sbusers.append("\n");
sbusers2.append(line);
sbusers2.append(",");
}
br.close();
} catch (IOException e) {
System.out.println("Failed to read users file.");
} finally {
}
It prints out information like "Professor -P?s". Since the reading isn't being done correctly the output to the new file is also being exported in ANSI.
它打印出诸如“Professor -P?s”之类的信息。由于读取未正确完成,新文件的输出也以 ANSI 格式导出。
回答by Marcelo
In the line:
在行中:
br = new BufferedReader(new InputStreamReader(new FileInputStream("./users.csv"),"UTF8"));
Your charset should be "UTF-8"
not "UTF8"
.
你的字符集应该是"UTF-8"
没有"UTF8"
。
回答by Erwin Smout
Printing to System.out using UTF encoding ????????????
使用 UTF 编码打印到 System.out ????????????
Why would you do that ? System.out and the encoding it uses is determined at the OS level (it becomes the default charset in the JVM), and that's the only one you want to use on System.out.
为什么要这么做 ?System.out 及其使用的编码是在操作系统级别确定的(它成为 JVM 中的默认字符集),并且这是您希望在 System.out 上使用的唯一字符集。
回答by Sam Barnum
Are you sure your CSV is UTF-8
encoded? My guess is that it's not. Try using ISO-8859-1
for reading the file, but keep the output as UTF-8
. (UTF8
and UTF-8
both tend to work, but you should use UTF-8
as @Marcelo suggested)
您确定您的 CSV 已UTF-8
编码吗?我的猜测是它不是。尝试ISO-8859-1
用于读取文件,但将输出保留为UTF-8
. (UTF8
并且UTF-8
两者都可以工作,但您应该UTF-8
按照@Marcelo 的建议使用)
回答by Anthony Accioly
Fist, as suggested by @Marcelo, use UTF8
instead of UTF-8
:
拳头,如@Marcelo 所建议的,使用UTF8
代替UTF-8
:
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream("./users.csv"), "UTF8"));
Second, forget about the PrintStream
, just use System.out
, or better yet, a logging API. You don't need to worry about how Java will output your string to the console (number one rule about character encoding: After you've read things successfully, let Java handle the encoding and only worry about it again when you are writing to an external file / database / etc).
其次,忘记PrintStream
,只需使用System.out
,或者更好的是,日志 API。您无需担心 Java 将如何将您的字符串输出到控制台(关于字符编码的第一条规则:在您成功读取内容后,让 Java 处理编码,并且仅在您写入外部文件/数据库/等)。
Third and more important, check that your file is really encoded in UTF-8, this is the cause of 99% of the encoding problems.
第三,更重要的是,检查您的文件是否真的使用 UTF-8 编码,这是 99% 编码问题的原因。
Make sure that you test with a real UTF-8 file (use tools like iconv to convert to UTF-8 and be sure about it).
确保使用真实的 UTF-8 文件进行测试(使用 iconv 等工具转换为 UTF-8 并确保它)。
回答by Sondre
found a potential solution(I had the same problem). Depending on the type of UTF-8 encoding you need to specify if further...
找到了一个潜在的解决方案(我遇到了同样的问题)。根据 UTF-8 编码的类型,您需要指定是否进一步...
Replace:
代替:
br = new BufferedReader(new InputStreamReader(new FileInputStream(
"./users.csv"), "UTF8"));
With:
和:
br = new BufferedReader(new InputStreamReader(new FileInputStream(
"./users.csv"), "ISO_8859_1"));
For further understanding: https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/
进一步了解:https: //mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/