如何编译编码为“UTF-8”的java源文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1726174/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 21:56:09  来源:igfitidea点击:

How to compile a java source file which is encoded as "UTF-8"?

javaunicodecompiler-errorsjavac

提问by asela38

I saved my Java source file specifying it's encoding type as UTF-8 (using Notepad, by default Notepad's encoding type is ANSI) and then I tried to compile it using:

我保存了我的 Java 源文件,指定它的编码类型为 UTF-8(使用记事本,默认情况下记事本的编码类型是 ANSI),然后我尝试使用以下方法编译它:

javac -encoding "UTF-8" One.java

but it gave an error message"

但它给出了一条错误消息”

One.java:1: illegal character: 279

?public class One {

^
1 error

Is there any other way, I can compile this?

有没有其他方法,我可以编译这个?

Here is the source:

这是来源:

public class One {
    public static void main( String[] args ){
        System.out.println("HI");
    }
} 

采纳答案by Daniel Pryden

Your file isbeing read as UTF-8, otherwise a character with value "65279" could never appear. javacexpects your source code to be in the platform default encoding, according to the javacdocumentation:

您的文件读为UTF-8,否则有值“65279”字符就绝不可能出现。javac期待你的源代码是在平台默认的编码,根据javac文件

If -encodingis not specified, the platform default converter is used.

如果未指定-encoding,则使用平台默认转换器。

Decimal 65279 is hex FEFF, which is the Unicode Byte Order Mark (BOM). It's unnecessary in UTF-8, because UTF-8 is always encoded as an octet stream and doesn't have endianness issues.

十进制 65279 是十六进制 FEFF,即Unicode 字节顺序标记 (BOM)。它在 UTF-8 中是不必要的,因为 UTF-8 始终编码为八位字节流并且没有字节序问题。

Notepad likes to stick in BOMs even when they're not necessary, but some programs don't like finding them. As others have pointed out, Notepad is not a very good text editor. Switching to a different text editor will almost certainly solve your problem.

即使不需要,记事本也喜欢保留在 BOM 中,但有些程序不喜欢找到它们。正如其他人指出的那样,记事本不是一个很好的文本编辑器。切换到不同的文本编辑器几乎肯定会解决您的问题。

回答by StevenWilkins

Try javac -encoding UTF8 One.java

尝试 javac -encoding UTF8 One.java

Without the quotes and it's UTF8, no dash.

没有引号,它是 UTF8,没有破折号。

See this forum thread for more links

请参阅此论坛主题以获取更多链接

回答by Nick Veys

Works fine here, even edited in Notepad. Moral of the story is, don't use Notepad. There's likely a unprintable character in there that Notepad is either inserting or happily hiding from you.

在这里工作正常,甚至在记事本中编辑。这个故事的寓意是,不要使用记事本。那里可能有一个无法打印的字符,记事本要么插入,要么高兴地躲着你。

回答by vaelico

I know this is a very old thread, but I was experiencing a similar problem with PHP instead of Java and Google took me here. I was writing PHP on Notepad++ (not plain Notepad) and noticed that an extra white line appeared every time I called an include file. Firebug showed that there was a 65279 character in those extra lines.

我知道这是一个非常古老的线程,但是我在使用 PHP 而不是 Java 时遇到了类似的问题,谷歌把我带到了这里。我在 Notepad++(不是普通的 Notepad)上编写 PHP,注意到每次调用包含文件时都会出现一条额外的白线。Firebug 显示这些额外的行中有 65279 个字符。

Actually both the main PHP file and the included files were encoded in UTF-8. However, Notepad++ has also an option to encode as "UTF-8 without BOM". This solved my problem.

实际上,主要的 PHP 文件和包含的文件都是以 UTF-8 编码的。但是,Notepad++ 也可以选择编码为“无 BOM 的 UTF-8”。这解决了我的问题。

Bottom line: UTF-8 encoding inserts here and there this extra BOM character unless you instruct your editor to use UTF8 without BOM.

底线:UTF-8 编码会在这里和那里插入这个额外的 BOM 字符,除非您指示您的编辑器使用没有 BOM 的 UTF8。

回答by Adrian Toman

Open the file in Notepad++ and select Encoding -> Convert to UTF-8 without BOM.

在 Notepad++ 中打开文件,然后选择编码 -> 不带 BOM 转换为 UTF-8。

回答by Prashanth

See Below For example we can discuss with an Program (Telugu words)

见下面例如我们可以用一个程序讨论(泰卢固语)

Program (UnicodeEx.java)

程序 (UnicodeEx.java)

class UnicodeEx {  
    public static void main(String[] args) {   
        double ????? = 10;  
        double ??????? = 25;   
        double ?????_???????_?????????;  
        System.out.println("The Value of Height = "+?????+" and Width = "+???????+"\n");  
        ?????_???????_????????? = ????? * ???????;  
        System.out.println("Area of Rectangle = "+?????_???????_?????????);  
    }  
}

This is the Program while saving as "UnicodeEx.java" and change Encoding to "unicode"

这是保存为“UnicodeEx.java”并将编码更改为“unicode”的程序

**How to Compile**

**如何编译**

javac -encoding "unicode" UnicodeEx.java

javac -encoding "unicode" UnicodeEx.java

How to Execute

如何执行

java UnicodeEx

java UnicodeEx

The Value of Height = 10.0 and Width = 25.0

高度 = 10.0 和宽度 = 25.0 的值

Area of Rectangle = 250.0

矩形面积 = 250.0

回答by Vic

I had the same problem. To solve it opened the file in a hex editor and found three "invisible" bytes at the beginning of the file. I removed them, and compilation worked.

我有同样的问题。为了解决它,它在十六进制编辑器中打开了文件,并在文件的开头发现了三个“不可见”字节。我删除了它们,编译工作。

回答by Etienne Delavennat

This isn't a problem with your text editor, it's a problem with javac ! The Unicode spec says BOM is optionnal in UTF-8, it doesn't say it's forbidden ! If a BOM can be there, then javac HAS to handle it, but it doesn't. Actually, using the BOM in UTF-8 files IS useful to distinguish an ANSI-coded file from an Unicode-coded file.

这不是您的文本编辑器的问题,而是 javac 的问题!Unicode 规范说 BOM 在 UTF-8 中是可选的,但并没有说它是被禁止的!如果 BOM 可以在那里,那么 javac 必须处理它,但它没有。实际上,在 UTF-8 文件中使用 BOM 有助于区分 ANSI 编码的文件和 Unicode 编码的文件。

The proposed solution of removing the BOM is only a workaround and not the proper solution.

删除 BOM 的建议解决方案只是一种解决方法,而不是正确的解决方案。

This bug report indicates that this "problem" will never be fixed : http://bugs.java.com/view_bug.do?bug_id=4508058

这个错误报告表明这个“问题”永远不会被修复:http: //bugs.java.com/view_bug.do?bug_id=4508058

Since this thread is in the top 2 google results for the "javac BOM" search, I'm leaving this here for future readers.

由于此线程位于“javac BOM”搜索的前 2 个 google 结果中,因此我将其留在这里以供将来的读者使用。

回答by Satyam Gupta

  • Open your file with WordPad or any other editor except Notepad.

  • Select Save As type as Text Document - MS-DOS Format

  • Reopen the Project

  • 使用写字板或除记事本以外的任何其他编辑器打开您的文件。

  • 选择另存为类型为文本文档 - MS-DOS 格式

  • 重新打开项目

回答by Konrad H?ffner

To extend the existing answers with a solution for Linux users:

使用适用于 Linux 用户的解决方案扩展现有答案

To remove the BOM on all .javafiles at once, go to your source directory and execute

要一次删除所有.java文件上的 BOM ,请转到您的源目录并执行

find -iregex '.*\.java' -type f -print0 | xargs -0 dos2unix

find -iregex '.*\.java' -type f -print0 | xargs -0 dos2unix

Requires find, xargsand dos2unixto be installed, which should be included in most distributions. The first statement finds all .javafiles in the current directory recursively, the second one converts each of them with the dos2unixtool, which is intended to convert line endings but also removes the BOM.

需要find,xargs并且dos2unix要安装,大多数发行版中都应该包含它。第一个语句.java递归地查找当前目录中的所有文件,第二个语句使用该dos2unix工具转换每个文件,该工具旨在转换行尾但也删除 BOM。

The line endings conversion should have no effect as it should already be in Linux \nformat on Linux if you configure your version control correctly but be warned that it does that as well in case you have one of those rare cases where that is not intended.

\n如果您正确配置了版本控制,行尾转换应该没有任何影响,因为它应该已经在 Linux上采用 Linux格式,但请注意,如果您有一种不打算这样做的罕见情况,它也会这样做。