如何编译编码为“UTF-8”的java源文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1726174/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to compile a java source file which is encoded as "UTF-8"?
提问by asela38
I saved my Java source file specifying it's encoding type as UTF-8 (using Notepad, by default Notepad's encoding type is ANSI) and then I tried to compile it using:
我保存了我的 Java 源文件,指定它的编码类型为 UTF-8(使用记事本,默认情况下记事本的编码类型是 ANSI),然后我尝试使用以下方法编译它:
javac -encoding "UTF-8" One.java
but it gave an error message"
但它给出了一条错误消息”
One.java:1: illegal character: 279
?public class One {
^
1 error
Is there any other way, I can compile this?
有没有其他方法,我可以编译这个?
Here is the source:
这是来源:
public class One {
public static void main( String[] args ){
System.out.println("HI");
}
}
采纳答案by Daniel Pryden
Your file isbeing read as UTF-8, otherwise a character with value "65279" could never appear. javac
expects your source code to be in the platform default encoding, according to the javac
documentation:
您的文件被读为UTF-8,否则有值“65279”字符就绝不可能出现。javac
期待你的源代码是在平台默认的编码,根据该javac
文件:
If -encodingis not specified, the platform default converter is used.
如果未指定-encoding,则使用平台默认转换器。
Decimal 65279 is hex FEFF, which is the Unicode Byte Order Mark (BOM). It's unnecessary in UTF-8, because UTF-8 is always encoded as an octet stream and doesn't have endianness issues.
十进制 65279 是十六进制 FEFF,即Unicode 字节顺序标记 (BOM)。它在 UTF-8 中是不必要的,因为 UTF-8 始终编码为八位字节流并且没有字节序问题。
Notepad likes to stick in BOMs even when they're not necessary, but some programs don't like finding them. As others have pointed out, Notepad is not a very good text editor. Switching to a different text editor will almost certainly solve your problem.
即使不需要,记事本也喜欢保留在 BOM 中,但有些程序不喜欢找到它们。正如其他人指出的那样,记事本不是一个很好的文本编辑器。切换到不同的文本编辑器几乎肯定会解决您的问题。
回答by StevenWilkins
Try javac -encoding UTF8 One.java
尝试 javac -encoding UTF8 One.java
Without the quotes and it's UTF8, no dash.
没有引号,它是 UTF8,没有破折号。
回答by Nick Veys
Works fine here, even edited in Notepad. Moral of the story is, don't use Notepad. There's likely a unprintable character in there that Notepad is either inserting or happily hiding from you.
在这里工作正常,甚至在记事本中编辑。这个故事的寓意是,不要使用记事本。那里可能有一个无法打印的字符,记事本要么插入,要么高兴地躲着你。
回答by vaelico
I know this is a very old thread, but I was experiencing a similar problem with PHP instead of Java and Google took me here. I was writing PHP on Notepad++ (not plain Notepad) and noticed that an extra white line appeared every time I called an include file. Firebug showed that there was a 65279 character in those extra lines.
我知道这是一个非常古老的线程,但是我在使用 PHP 而不是 Java 时遇到了类似的问题,谷歌把我带到了这里。我在 Notepad++(不是普通的 Notepad)上编写 PHP,注意到每次调用包含文件时都会出现一条额外的白线。Firebug 显示这些额外的行中有 65279 个字符。
Actually both the main PHP file and the included files were encoded in UTF-8. However, Notepad++ has also an option to encode as "UTF-8 without BOM". This solved my problem.
实际上,主要的 PHP 文件和包含的文件都是以 UTF-8 编码的。但是,Notepad++ 也可以选择编码为“无 BOM 的 UTF-8”。这解决了我的问题。
Bottom line: UTF-8 encoding inserts here and there this extra BOM character unless you instruct your editor to use UTF8 without BOM.
底线:UTF-8 编码会在这里和那里插入这个额外的 BOM 字符,除非您指示您的编辑器使用没有 BOM 的 UTF8。
回答by Adrian Toman
Open the file in Notepad++ and select Encoding -> Convert to UTF-8 without BOM.
在 Notepad++ 中打开文件,然后选择编码 -> 不带 BOM 转换为 UTF-8。
回答by Prashanth
See Below For example we can discuss with an Program (Telugu words)
见下面例如我们可以用一个程序讨论(泰卢固语)
Program (UnicodeEx.java)
程序 (UnicodeEx.java)
class UnicodeEx {
public static void main(String[] args) {
double ????? = 10;
double ??????? = 25;
double ?????_???????_?????????;
System.out.println("The Value of Height = "+?????+" and Width = "+???????+"\n");
?????_???????_????????? = ????? * ???????;
System.out.println("Area of Rectangle = "+?????_???????_?????????);
}
}
This is the Program while saving as "UnicodeEx.java" and change Encoding to "unicode"
这是保存为“UnicodeEx.java”并将编码更改为“unicode”的程序
**How to Compile**
**如何编译**
javac -encoding "unicode" UnicodeEx.java
javac -encoding "unicode" UnicodeEx.java
How to Execute
如何执行
java UnicodeEx
java UnicodeEx
The Value of Height = 10.0 and Width = 25.0
高度 = 10.0 和宽度 = 25.0 的值
Area of Rectangle = 250.0
矩形面积 = 250.0
回答by Vic
I had the same problem. To solve it opened the file in a hex editor and found three "invisible" bytes at the beginning of the file. I removed them, and compilation worked.
我有同样的问题。为了解决它,它在十六进制编辑器中打开了文件,并在文件的开头发现了三个“不可见”字节。我删除了它们,编译工作。
回答by Etienne Delavennat
This isn't a problem with your text editor, it's a problem with javac ! The Unicode spec says BOM is optionnal in UTF-8, it doesn't say it's forbidden ! If a BOM can be there, then javac HAS to handle it, but it doesn't. Actually, using the BOM in UTF-8 files IS useful to distinguish an ANSI-coded file from an Unicode-coded file.
这不是您的文本编辑器的问题,而是 javac 的问题!Unicode 规范说 BOM 在 UTF-8 中是可选的,但并没有说它是被禁止的!如果 BOM 可以在那里,那么 javac 必须处理它,但它没有。实际上,在 UTF-8 文件中使用 BOM 有助于区分 ANSI 编码的文件和 Unicode 编码的文件。
The proposed solution of removing the BOM is only a workaround and not the proper solution.
删除 BOM 的建议解决方案只是一种解决方法,而不是正确的解决方案。
This bug report indicates that this "problem" will never be fixed : http://bugs.java.com/view_bug.do?bug_id=4508058
这个错误报告表明这个“问题”永远不会被修复:http: //bugs.java.com/view_bug.do?bug_id=4508058
Since this thread is in the top 2 google results for the "javac BOM" search, I'm leaving this here for future readers.
由于此线程位于“javac BOM”搜索的前 2 个 google 结果中,因此我将其留在这里以供将来的读者使用。
回答by Satyam Gupta
Open your file with WordPad or any other editor except Notepad.
Select Save As type as Text Document - MS-DOS Format
Reopen the Project
使用写字板或除记事本以外的任何其他编辑器打开您的文件。
选择另存为类型为文本文档 - MS-DOS 格式
重新打开项目
回答by Konrad H?ffner
To extend the existing answers with a solution for Linux users:
使用适用于 Linux 用户的解决方案扩展现有答案:
To remove the BOM on all .java
files at once, go to your source directory and execute
要一次删除所有.java
文件上的 BOM ,请转到您的源目录并执行
find -iregex '.*\.java' -type f -print0 | xargs -0 dos2unix
find -iregex '.*\.java' -type f -print0 | xargs -0 dos2unix
Requires find
, xargs
and dos2unix
to be installed, which should be included in most distributions. The first statement finds all .java
files in the current directory recursively, the second one converts each of them with the dos2unix
tool, which is intended to convert line endings but also removes the BOM.
需要find
,xargs
并且dos2unix
要安装,大多数发行版中都应该包含它。第一个语句.java
递归地查找当前目录中的所有文件,第二个语句使用该dos2unix
工具转换每个文件,该工具旨在转换行尾但也删除 BOM。
The line endings conversion should have no effect as it should already be in Linux \n
format on Linux if you configure your version control correctly but be warned that it does that as well in case you have one of those rare cases where that is not intended.
\n
如果您正确配置了版本控制,行尾转换应该没有任何影响,因为它应该已经在 Linux上采用 Linux格式,但请注意,如果您有一种不打算这样做的罕见情况,它也会这样做。