java 从文件中读取 UTF-16 字符并将它们存储为 UTF-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5104713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 09:27:20  来源:igfitidea点击:

Read UTF-16 chars from a file and store them as UTF-8

javafileutf-8

提问by Argyro Kazaki

I have a Personpojo, with a nameattribute which I store in my database within the respective personstable. My db server is MySQL with utf-8 set as the default server encoding, the personstable is an InnoDB table which was also created with utf-8 as the default encoding, and my db connection string specifies utf-8 as the connection encoding.

我有一个Personpojo,有一个name属性,我将它存储在我的数据库中的相应人员表中。我的数据库服务器是 MySQL,默认服务器编码设置为 utf-8,persons表是一个 InnoDB 表,它也是使用 utf-8 作为默认编码创建的,我的数据库连接字符串指定 utf-8 作为连接编码。

I am required to create and store new Person pojos, by reading their names from a txt file (persons.txt) which contains a name in every line, but the file encoding is UTF-16.

我需要创建和存储新的 Person pojo,方法是从每行包含一个名称的 txt 文件 ( persons.txt) 中读取它们的名称,但文件编码为UTF-16

persons.txt

人物.txt

John

约翰

Μαρ?α

αρα

Hélène

海伦娜

etc..

等等..

Here is a sample code:

这是一个示例代码:

PersonDao dao = new PersonDao();
File file = new File("persons.txt");
BufferedReader reader = new BufferedReader(
                        new InputStreamReader(new FileInputStream(file), "UTF-16"));
String line = reader.readLine();
while (line!=null) {
    Person p = new Person();
    p.setName(line.trim());
    dao.save(p);
    line = reader.readLine();
}

To sum up, I am reading string characters as utf-16, store them in local variables and persist them as utf-8.

总而言之,我将字符串字符读取为 utf-16,将它们存储在局部变量中并将它们保存为 utf-8。

I would like to ask: Does any character conversion take place during this procedure? If yes, then at what point does this happen? Is it possible that I may end up storing broken characters due to the utf-16 -> utf-8 workflow?

请问:在这个过程中有没有进行字符转换?如果是,那么在什么时候会发生这种情况?由于 utf-16 -> utf-8 工作流程,我最终可能会存储损坏的字符吗?

采纳答案by axtavt

InputStreamReaderconverts characters from their external representation in the specified encoding (UTF-16 in your case) to the internal representation (i.e. char, String), that is always UTF-16 too, so effectively there is no conversion here in your case.

InputStreamReader将字符从指定编码的外部表示(在您的情况下为 UTF-16)转换为内部表示(即char, String),这也始终是 UTF-16,因此在您的情况下,这里实际上没有转换。

Internal representation of Strings should be converted to the database encoding by your JDBC driver, so you shouldn't care about it (though in the case of MySQL you should care about specifying the proper database encoding in the connection string).

Strings 的内部表示应该由您的 JDBC 驱动程序转换为数据库编码,因此您不必关心它(尽管在 MySQL 的情况下,您应该关心在连接字符串中指定正确的数据库编码)。

If input encoding and (in the case of MySQL) database encoding are specified correctly, there are no chances of data loss during conversions, since both UTF-8 and UTF-16 are used to represent the same character set.

如果正确指定了输入编码和(在 MySQL 的情况下)数据库编码,则在转换过程中不会丢失数据,因为 UTF-8 和 UTF-16 都用于表示相同的字符集。

回答by a CVn

UTF-8 and UTF-16 cover the same range of characters (full Unicode), so if the input data is valid, the output data will be valid too (unless there is a bug in dao.save()).

UTF-8 和 UTF-16 覆盖相同范围的字符(完整 Unicode),因此如果输入数据有效,则输出数据也将有效(除非 中存在错误dao.save())。