eclipse 源代码是否应该以 UTF-8 格式保存
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2178348/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Should source code be saved in UTF-8 format
提问by JARC
How important is it to save your source code in UTF-8 format?
以 UTF-8 格式保存源代码有多重要?
Eclipse on Windows uses CP1252 character encoding by default. The CP1251 format means non UTF-8 characters can be saved and I have seen this happen if you copy and paste from a Word document for a comment.
Windows 上的 Eclipse 默认使用 CP1252 字符编码。CP1251 格式意味着可以保存非 UTF-8 字符,如果您从 Word 文档复制并粘贴以进行评论,我已经看到了这种情况。
The reason I ask is because out of habit I set-up Maven encoding to be in UTF-8 format and recently it has caught a few non mappable errors.
我问的原因是因为出于习惯,我将 Maven 编码设置为 UTF-8 格式,最近它发现了一些不可映射的错误。
(update) Please add any reasons for doing so and why, are there some common gotchas that should be known?
(更新)请添加这样做的任何原因以及为什么,是否有一些应该知道的常见问题?
(update) What is your goal? To find the best practice so when ask why should we use UTF-8 I have a good answer, right now I don't.
(更新)你的目标是什么?为了找到最佳实践,所以当问到我们为什么要使用 UTF-8 时,我有一个很好的答案,现在我没有。
采纳答案by McDowell
What is your goal? Balance your needs against the pros and cons of this choice.
你的目标是什么?平衡您的需求与此选择的利弊。
UTF-8 Pros
UTF-8 的优点
- allows use of all character literals without
\uHHHH
escaping
- 允许使用所有字符文字而不
\uHHHH
转义
UTF-8 Cons
UTF-8 的缺点
- using non-ASCII character literals without
\uHHHH
increases risk of character corruption- font and keyboard issues can arise
- need to document and enforce use of UTF-8 in all tools (editors, compilers build scripts, diff tools)
- beware the byte order mark
- 使用非 ASCII 字符文字而不
\uHHHH
增加字符损坏的风险- 可能会出现字体和键盘问题
- 需要在所有工具(编辑器、编译器构建脚本、差异工具)中记录和强制使用 UTF-8
- 注意字节顺序标记
ASCII Pros
ASCII 优点
- character/byte mappings are shared by a wide range of encodings
- makes source files very portable
- often obviates the need for specifying encoding meta-data (since the files would be identical if they were re-encoded as UTF-8, Windows-1252, ISO 8859-1 and most things short of UTF-16 and/or EBCDIC)
- 字符/字节映射由多种编码共享
- 使源文件非常便携
- 通常不需要指定编码元数据(因为如果将文件重新编码为 UTF-8、Windows-1252、ISO 8859-1 和大多数缺少 UTF-16 和/或 EBCDIC 的文件,它们将是相同的)
ASCII Cons
ASCII 缺点
- limited character set
- this isn't the 1960s
- 有限的字符集
- 这不是 1960 年代
Note: ASCII is 7-bit, not "extended" and not to be confused with Windows-1252, ISO 8859-1, or anything else.
注意:ASCII 是 7 位的,不是“扩展”的,不要与 Windows-1252、ISO 8859-1 或其他任何东西混淆。
回答by BalusC
Important is at least that you need to be consistentwith the encoding used to avoid herrings. Thus not, X here, Y there and Z elsewhere. Save source code in encoding X. Set code input to encoding X. Set code output to encoding X. Set characterbased FTP transfer to encoding X. Etcetera.
重要的是至少您需要与用于避免鲱鱼的编码保持一致。因此不是,X 在这里,Y 那里和 Z 其他地方。将源代码保存在编码 X 中。将代码输入设置为编码 X。将代码输出设置为编码 X。将基于字符的 FTP 传输设置为编码 X。等等。
Nowadays UTF-8
is a good choice as it covers every character the human world is aware of and is pretty everywhere supported. So, yes, I would set workspace encoding to it as well. I also use it so.
如今UTF-8
是一个不错的选择,因为它涵盖了人类世界所知道的每个角色,并且几乎无处不在。所以,是的,我也会为它设置工作区编码。我也是这么用的。
回答by finnw
Eclipse's default setting of using the platform default encoding is a poor decision IMHO. I found it necessary to change the default to UTF-8 shortly after installing it because some of my existing source files used it (probably from snippets copied/pasted from web pages.)
Eclipse 使用平台默认编码的默认设置是一个糟糕的决定恕我直言。我发现有必要在安装后不久将默认值更改为 UTF-8,因为我现有的一些源文件使用了它(可能来自从网页复制/粘贴的片段。)
The Java Language and API specs require UTF-8 support so you're definitely okay as far as the standard tools go, and it's a long time since I've seen a decent editor that did not support UTF-8.
Java 语言和 API 规范需要 UTF-8 支持,因此就标准工具而言,您绝对没问题,而且我已经很久没有看到不支持 UTF-8 的不错的编辑器了。
Even in projects that use JNI, your C sources will normally be in US-ASCII which is a subset of UTF-8 so having both open in the same IDE will not be a problem.
即使在使用 JNI 的项目中,您的 C 源代码通常是 US-ASCII,它是 UTF-8 的一个子集,因此在同一个 IDE 中打开它们不会有问题。
回答by Russell Newquist
I don't think there's really a straight yes or no answer to this question. I would say that the following guidelines should be used to pick an encoding format, in order of priority listed (highest to lowest):
我不认为这个问题真的有直接是或否的答案。我会说应该使用以下准则来选择编码格式,按优先级列出的顺序(从高到低):
1) Pick an encoding your tool chain supports. This is a lot easier than it used to be. Even in recent memory a lot of compilers and languages basically only supported ASCII, which more or less forced developers into coding in Western European languages. These days, many of the newer languages support other encodings, and almost all decent editors and IDEs support a tremendously long list of encodings. Still... there are just enoughholdouts that you need to double check before you settle on an encoding.
1) 选择您的工具链支持的编码。这比以前容易多了。即使在最近的记忆中,很多编译器和语言基本上只支持 ASCII,这或多或少地迫使开发人员使用西欧语言进行编码。如今,许多较新的语言都支持其他编码,并且几乎所有不错的编辑器和 IDE 都支持非常长的编码列表。仍然......在确定编码之前,您需要仔细检查足够多的保留。
2) Pick an encoding that supports as many of the alphabets you wish to use as possible. I place this as a secondary priority because frankly, if your tools don't support it it doesn't really matter whether you like the encoding better or not.
2) 选择一种支持尽可能多的字母表的编码。我将此作为次要优先事项,因为坦率地说,如果您的工具不支持它,那么您是否更喜欢编码并不重要。
UTF-8 is an excellent choice in many circumstances of today's world. It's an ugly, inelegant format, but it solves a whole host of problems (namely dealing with legacy code) that break other encodings, and it seems to becoming more and more the de facto standard of character encodings. It supports every major alphabet, darn near every editor on the planet supports it now, and a whole host of languages/compilers support it, too. But as I mentioned above, there are just enoughlegacy holdouts that you need to double check your tool chain from end to end before you settle on it definitively.
在当今世界的许多情况下,UTF-8 都是绝佳的选择。这是一种丑陋、不雅的格式,但它解决了许多破坏其他编码的问题(即处理遗留代码),并且它似乎越来越成为字符编码的事实上的标准。它支持所有主要字母表,现在地球上几乎每个编辑器都支持它,而且许多语言/编译器也支持它。但正如我上面提到的,有足够的遗留问题,您需要在最终确定之前仔细检查您的工具链。
回答by poke
Yes, unless your compiler/interpreter is not able to work with UTF-8 files, it is definitely the way to go.
是的,除非您的编译器/解释器无法处理 UTF-8 文件,否则这绝对是可行的方法。