Eclipse 错误的 Java 属性 UTF-8 编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31143923/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 10:42:29  来源:igfitidea点击:

Eclipse wrong Java properties UTF-8 encoding

javaeclipseutf-8properties-file

提问by BuZZ-dEE

I have a JavaEE project, in which I use message properties files. The encoding of those file is set to UTF-8. In the file I use the german umlauts like ?, ?, ü. The problem is, sometimes those characters are replaced with unicode like \uFFFD\uFFFD, but not for every character. Now, I have a case where ?and üare both replaced with \uFFFD\uFFFD, but not for every occurring of ?and ü.

我有一个 JavaEE 项目,我在其中使用消息属性文件。这些文件的编码设置为 UTF-8。在文件中,我使用了德语变音,如?, ?, ü。问题是,有时这些字符会被替换为 unicode 之类的\uFFFD\uFFFD,但并非针对每个字符。现在,我有一个案例,其中?ü都被替换为\uFFFD\uFFFD,但不是每次都出现?ü

The Git diff shows me something like this:

Git diff 向我展示了这样的东西:

 mail.adresses=E-Mail hinzufügen:
-mail.adresses.multiple=E-Mails durch Kommata getrennt hinzufügen.
+mail.adresses.multiple=E-Mails durch Kommata getrennt hinzuf\uFFFD\uFFFDgen.
 mail.title=Einladungs-E-Mail
 box.preview=Vorschau
 box.share.text=Sie k?nnen jetzt die ausgew?hlten Bilder mit Ihren Freunden teilen.
@@ -6880,7 +6880,7 @@ browser.cancel=Abbrechen
 browser.selectImage=übernehmen
 browser.starImage=merken
 browser.removeImage=L?schen
-browser.searchForSimilarImages=?hnliche
+browser.searchForSimilarImages=\uFFFD\uFFFDhnliche
 browser.clear_drop_box=l?schen

Also, there are lines changed, which I have not touched. I don't understand why I get such a behavior. What could be the cause for the above problem?

此外,还有一些线条改变了,我没有接触过。我不明白为什么我会有这样的行为。出现上述问题的原因可能是什么?

My system:

我的系统:

  • Antergos / Arch Linux

    • System encoding UTF-8

      Python 3.5.0 (default, Sep 20 2015, 11:28:25) 
      [GCC 5.2.0] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import sys
      >>> sys.getdefaultencoding()
      'utf-8'
      
  • Eclipse Mars 1

    • Text file encoding UTF-8 ext file encoding
    • Properties file encoding UTF-8 Properties file encoding
  • Tomcat 8
  • Java JDK 8
  • Antergos / Arch Linux

    • 系统编码 UTF-8

      Python 3.5.0 (default, Sep 20 2015, 11:28:25) 
      [GCC 5.2.0] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import sys
      >>> sys.getdefaultencoding()
      'utf-8'
      
  • 日食火星 1

    • 文本文件编码 UTF-8 ext文件编码
    • 属性文件编码 UTF-8 属性文件编码
  • 雄猫8
  • Java JDK 8

If I use another Editor like Atomto edit those message properties files, I don't ran into this problem.

如果我使用其他编辑器(如Atom)来编辑这些消息属性文件,则不会遇到此问题。

I also realized in a case, if I copy the original value browser.searchForSimilarImages=?hnlichefrom Git diff and replace the wrong value browser.searchForSimilarImages=\uFFFD\uFFFDhnlichein Eclipse with that, then I have the correct umlauts in the message properties file.

我还意识到,如果我browser.searchForSimilarImages=?hnliche从 Git diff复制原始值并用它替换browser.searchForSimilarImages=\uFFFD\uFFFDhnlicheEclipse 中的错误值,那么我在消息属性文件中有正确的变音符号。

回答by tilois

Properties Filesare expected to be ISO-8859-1 (Latin-1)encoded. Most likely this what eclipse was set to by default as well.

属性文件应采用ISO-8859-1 (Latin-1)编码。很可能这也是 eclipse 的默认设置。

You have to make sure that every tool which is run in the build or whatever disregards the spec and uses UTF-8 instead.

您必须确保在构建中运行的每个工具或任何工具都无视规范并使用 UTF-8。

回答by hagrawal

Root cause:

根本原因:

By default ISO 8859-1 character encoding is used for Eclipse properties file (read here), so if the file contains any character beyond ISO 8859-1 then it will not be processed as expected.

默认情况下,Eclipse 属性文件使用 ISO 8859-1 字符编码(在此处阅读),因此如果文件包含超出 ISO 8859-1 的任何字符,则不会按预期处理。

Solution 1

解决方案1

If you use Eclipse then you will notice that it implicitly converts the special character into \uXXXX equivalent. Try copying

如果您使用 Eclipse,那么您会注意到它隐式地将特殊字符转换为等效的 \uXXXX。尝试复制

会意字 / 會意字

会意字 / 会意字

into a properties file opened in Eclipse.

进入在 Eclipse 中打开的属性文件。

EDIT: As per comment from OP

编辑:根据 OP 的评论

Update the encoding of your Eclipse as shown below. If you set encoding as UTF-32 then even you can see Chinese character, which you cannot see generally.

更新 Eclipse 的编码,如下所示。如果将编码设置为UTF-32,那么您甚至可以看到一般看不到的汉字。

How to change Encoding of properties file in Eclipse:See thisEclipse Bugzilla bug for more details, which talks about several other possibilities and in the end suggest what I have highlighted below. enter image description here

如何在 Eclipse 中更改属性文件的编码:有关更多详细信息,请参阅Eclipse Bugzilla 错误,其中讨论了其他几种可能性,最后提出了我在下面强调的内容。 在此处输入图片说明

Chinese characters can be seen in Eclipse after encoding is set properly:enter image description here

正确设置编码后在Eclipse中可以看到汉字:在此处输入图片说明

Solution 2

解决方案2

If above doesn't work consistently for you (it does work for me and I never see encoding issues) then try this using some Eclipse plugin which handles encoding of propertiesor other files. For example Eclipse ResourceBundle Editoror Extended Resource-Bundle editor

如果以上对您来说并不能始终如一地工作(它确实对我有用,而且我从来没有看到编码问题),那么请尝试使用一些处理属性或其他文件编码的 Eclipse 插件。例如Eclipse ResourceBundle EditorExtended Resource-Bundle editor

I would recommend using Eclipse ResourceBundle Editor.

我建议使用 Eclipse ResourceBundle Editor。

Solution 3

解决方案3

Another possibility to change encoding of file is using Edit --> Set Encodingoption. It really matters because it changes the default character set and file encoding. Play around with by changing encoding using Edit --> Set Encodingoption and do following Java sysout System.out.println("Default Charset=" + Charset.defaultCharset());and System.out.println(System.getProperty("file.encoding"));

更改文件编码的另一种可能性是使用Edit --> Set Encoding选项。这真的很重要,因为它更改了默认字符集和文件编码。通过使用Edit --> Set Encoding选项更改编码并按照 Java sysoutSystem.out.println("Default Charset=" + Charset.defaultCharset());System.out.println(System.getProperty("file.encoding"));

enter image description here

在此处输入图片说明



As an aside: 1

顺便说一句:1

Process the properties file to have content with ISO 8859-1 character encoding by using native2ascii - Native-to-ASCII Converter

使用native2ascii - Native-to-ASCII Converter处理属性文件以包含具有 ISO 8859-1 字符编码的内容

What native2asciidoes: It converts all the non-ISO 8859-1 character in their equivalent \uXXXX. This is a good tool because you need not to search the \uXXXX equivalent of special character.

什么native2ascii的作用:它把他们等同为\ uXXXX所有非ISO 8859-1字符。这是一个很好的工具,因为您不需要搜索特殊字符的 \uXXXX 等价物。

Usage for UTF-8: native2ascii -encoding utf8 e:\a.txt e:\b.txt

UTF-8 的用法: native2ascii -encoding utf8 e:\a.txt e:\b.txt



As an aside: 2

顺便说一句:2

Every computer program whether an IDE, application server, web server, browser, etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

每个计算机程序,无论是 IDE、应用程序服务器、Web 服务器、浏览器等,都只能理解位,因此它需要知道如何解释位以使其具有预期意义,因为根据所使用的编码,相同的位可以表示不同的字符. 这就是“编码”通过给出一个唯一标识符来表示一个字符,以便所有计算机程序、不同的操作系统等都知道正确解释它的正确方法而出现的地方。

So, if you have written into a file using some encoding scheme, lets say UTF-8, and then reading using any editor but running with encoding scheme as UTF-8 then you can expect to get correct display.

因此,如果您使用某种编码方案写入文件,例如 UTF-8,然后使用任何编辑器读取但以 UTF-8 编码方案运行,那么您可以期望获得正确的显示。

Please do read my this answerto get more details but from browser-server perspective.

请阅读我的这个答案以获取更多详细信息,但从浏览器服务器的角度来看。

回答by user1363516

Add the following arguments to your eclipse.inifile.

将以下参数添加到您的eclipse.ini文件中。

-Dclient.encoding.override=UTF-8
-Dfile.encoding=UTF-8

By default Eclipse uses the encoding format picked up by the Java Virtual Machine (JVM). Also, you can set the file encoding to utf-8.

默认情况下,Eclipse 使用 Java 虚拟机 (JVM) 选择的编码格式。此外,您可以将文件编码设置为utf-8.

回答by Calon

This looks like a mixture of Eclipse and git encoding or rather not-encoding.

这看起来像是 Eclipse 和 git 编码或非编码的混合。

Git uses raw bytes and doesn't care about encoding. Using git diffyou might get characters like shown here. An example there is R<C3><BC>ckg<C3><A4>ngig # should be "Rückg?ngig".

Git 使用原始字节并且不关心编码。使用git diff像显示你可能会得到字符这里。有一个例子R<C3><BC>ckg<C3><A4>ngig # should be "Rückg?ngig"

As you can see there's two funny bracket things showing per umlaut. And in your editor, there are always two \uFFFDfor each umlaut in the lines starting with +.

正如您所看到的,每个变音符号都显示了两个有趣的括号内容。在您的编辑器中,\uFFFD+开头的行中的每个变音符号总是有两个。

So I assume that your UTF-8 editor tries to interpret the git notation and fails. This in turn leads to the representation \uFFFD, which basically meands that this is character whose value is unknown or unrepresentable (see here).

所以我假设您的 UTF-8 编辑器尝试解释 git 符号并失败。这反过来又导致了表示\uFFFD,这基本上意味着这是值未知或不可表示的字符(请参阅此处)。

Like suggested in the first link, you can try setting LESSCHARSET=UTF-8in your environment variable (Windows). Hmm, in Linux it should be in etc/profile?

就像第一个链接中建议的那样,您可以尝试LESSCHARSET=UTF-8在环境变量(Windows)中进行设置。嗯,在 Linux 中它应该在etc/profile?

回答by Bruce Zu

see: a marker such as FFFD (REPLACEMENT CHARACTER) in http://unicode.org/faq/utf_bom.html

请参阅:http://unicode.org/faq/utf_bom.html 中的标记,例如 FFFD (REPLACEMENT CHARACTER)

and see native2ascii --help

并查看 native2ascii --help

   -encoding encoding_name
          Specifies the name of the character encoding to be used by the conversion procedure. If this option is not present, then the
          default character encoding (as determined by the java.nio.charset.Charset.defaultCharset method) is used. The encoding_name
          string must be the name of a character encoding that is supported by the JRE. See Supported Encodings at
          http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html

a case

一个案例

$ file yourfile.properties
yourfile.properties : ISO-8859 text, with very long lines
$ native2ascii -encoding ISO-8859-1 yourfile.properties yourfile.properties