java 从文件中读取 UTF-16 字符并将它们存储为 UTF-8

Question

提问by Argyro Kazaki

I have a Personpojo, with a nameattribute which I store in my database within the respective personstable. My db server is MySQL with utf-8 set as the default server encoding, the personstable is an InnoDB table which was also created with utf-8 as the default encoding, and my db connection string specifies utf-8 as the connection encoding.

我有一个Personpojo，有一个name属性，我将它存储在我的数据库中的相应人员表中。我的数据库服务器是 MySQL，默认服务器编码设置为 utf-8，persons表是一个 InnoDB 表，它也是使用 utf-8 作为默认编码创建的，我的数据库连接字符串指定 utf-8 作为连接编码。

I am required to create and store new Person pojos, by reading their names from a txt file (persons.txt) which contains a name in every line, but the file encoding is UTF-16.

我需要创建和存储新的 Person pojo，方法是从每行包含一个名称的 txt 文件 ( persons.txt) 中读取它们的名称，但文件编码为UTF-16。

persons.txt

人物.txt

John

约翰

Μαρ?α

αρα

Hélène

海伦娜

etc..

等等..

Here is a sample code:

这是一个示例代码：

PersonDao dao = new PersonDao();
File file = new File("persons.txt");
BufferedReader reader = new BufferedReader(
                        new InputStreamReader(new FileInputStream(file), "UTF-16"));
String line = reader.readLine();
while (line!=null) {
    Person p = new Person();
    p.setName(line.trim());
    dao.save(p);
    line = reader.readLine();
}

To sum up, I am reading string characters as utf-16, store them in local variables and persist them as utf-8.

总而言之，我将字符串字符读取为 utf-16，将它们存储在局部变量中并将它们保存为 utf-8。

I would like to ask: Does any character conversion take place during this procedure? If yes, then at what point does this happen? Is it possible that I may end up storing broken characters due to the utf-16 -> utf-8 workflow?

请问：在这个过程中有没有进行字符转换？如果是，那么在什么时候会发生这种情况？由于 utf-16 -> utf-8 工作流程，我最终可能会存储损坏的字符吗？

Answer 1

采纳答案by axtavt

InputStreamReaderconverts characters from their external representation in the specified encoding (UTF-16 in your case) to the internal representation (i.e. char, String), that is always UTF-16 too, so effectively there is no conversion here in your case.

InputStreamReader将字符从指定编码的外部表示（在您的情况下为 UTF-16）转换为内部表示（即char, String），这也始终是 UTF-16，因此在您的情况下，这里实际上没有转换。

Internal representation of Strings should be converted to the database encoding by your JDBC driver, so you shouldn't care about it (though in the case of MySQL you should care about specifying the proper database encoding in the connection string).

Strings 的内部表示应该由您的 JDBC 驱动程序转换为数据库编码，因此您不必关心它（尽管在 MySQL 的情况下，您应该关心在连接字符串中指定正确的数据库编码）。

If input encoding and (in the case of MySQL) database encoding are specified correctly, there are no chances of data loss during conversions, since both UTF-8 and UTF-16 are used to represent the same character set.

如果正确指定了输入编码和（在 MySQL 的情况下）数据库编码，则在转换过程中不会丢失数据，因为 UTF-8 和 UTF-16 都用于表示相同的字符集。

Answer 2

回答by a CVn

UTF-8 and UTF-16 cover the same range of characters (full Unicode), so if the input data is valid, the output data will be valid too (unless there is a bug in dao.save()).

UTF-8 和 UTF-16 覆盖相同范围的字符（完整 Unicode），因此如果输入数据有效，则输出数据也将有效（除非中存在错误dao.save()）。

java 从文件中读取 UTF-16 字符并将它们存储为 UTF-8

提问by Argyro Kazaki

采纳答案by axtavt

回答by a CVn

相关推荐

最近更新

标签

java 从文件中读取 UTF-16 字符并将它们存储为 UTF-8

提问by Argyro Kazaki

采纳答案by axtavt

回答by a CVn

相关推荐

java 公共静态同步和公共静态有什么区别？

java 如何修剪字符串中的“输入键”

java ImageIO.write() 方法和 png

jar 到 .java 的转换

相关推荐

最近更新

标签