Python 何时使用 utf8 作为 py 文件中的标头
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13807748/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When to use utf8 as a header in py files
提问by
Some source files, from downloaded code, have the following header
来自下载代码的一些源文件具有以下标题
# -*- coding: utf-8 -*-
I have an idea what utf-8 encoding is but why would it be needed as a header in a python source file?
我知道 utf-8 编码是什么,但为什么需要它作为 python 源文件中的标头?
采纳答案by mihaicc
wherever you need to use in your code chars that aren't from ascii, like:
无论您需要在代码中使用不是来自 ascii 的字符,例如:
?
interpreter will complain that he doesn't understand that char.
口译员会抱怨他不理解那个字符。
Usually this happens when you define constants.
通常在定义常量时会发生这种情况。
Example: Add into x.py
示例:添加到 x.py
print '?'
then start a python console
然后启动一个python控制台
import x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "x.py", line 1
SyntaxError: Non-ASCII character '\xc4' in file x.py on line 1,
but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
回答by arynaq
When you use non-ascii characters. For instance when I comment my source in norwegian if charachters ??? occur in the .py it will complain and not "compile".
当您使用非 ascii 字符时。例如,当我用挪威语评论我的来源时,如果字符 ??? 发生在 .py 中它会抱怨而不是“编译”。
回答by Esailija
Whenever text is read or written, encodings come in play. Always. A python interpreter has to read your file as text, to understand it. The only situation where you could get away without having to deal with encodings is when you only use characters in the ASCII range. The interpreter can in this case use virtually any encoding in the world, and get it right because almost all encodings encode these characters to same bytes.
每当读取或写入文本时,编码就会发挥作用。总是。python 解释器必须将您的文件作为文本读取,才能理解它。无需处理编码就可以逃脱的唯一情况是当您只使用 ASCII 范围内的字符时。在这种情况下,解释器几乎可以使用世界上的任何编码,并且可以正确使用,因为几乎所有的编码都将这些字符编码为相同的字节。
You should not use coding: utf-8just because you have characters beyond ascii in your file, it can even be harmful. It is a hint for the python interpreter, to tell it what encoding your file is in. Unless you have configured your text editor, the text editor will most likely not save your files in utf-8. So now the hint you gave to the python interpreter, is wrong.
你不应该coding: utf-8仅仅因为你的文件中有超过 ascii 的字符而使用它,它甚至可能是有害的。这是python解释器的一个提示,告诉它你的文件是什么编码。除非你已经配置了你的文本编辑器,否则文本编辑器很可能不会以utf-8格式保存你的文件。所以现在你给 python 解释器的提示是错误的。
So you should use it when your file is encoded in utf-8. If it's encoded in windows-1252, you should use coding: windows-1252and so on.
因此,当您的文件以 utf-8 编码时,您应该使用它。如果它是在 windows-1252 中编码的,你应该使用coding: windows-1252等等。
回答by voscausa
Always use UTF-8 and make sure your editor also uses UTF-8. Start your Python script like this if you use Python 27:
始终使用 UTF-8,并确保您的编辑器也使用 UTF-8。如果您使用 Python 27,请像这样启动您的 Python 脚本:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
This is a good blog post from Nick Johnson about Python and UTF-8:
这是 Nick Johnson 关于 Python 和 UTF-8 的一篇很好的博客文章:
http://blog.notdot.net/2010/07/Getting-unicode-right-in-PythonBy the way, this post was written before he could use:
http://blog.notdot.net/2010/07/Getting-unicode-right-in-Python顺便说一句,这个帖子是在他能用之前写的:
from __future__ import unicode_literals
回答by neves
A more direct answer:
更直接的答案:
In Python 3+: you don't need to declare. UTF-8 is the default. Make sure the file is encoded in UTF-8. Some Windows editors don't have it by default. It won't hurt to declare it, and some editors may use it.
在 Python 3+ 中:您不需要声明。UTF-8 是默认的。确保文件以 UTF-8 编码。默认情况下,某些 Windows 编辑器没有它。声明它不会有什么坏处,一些编辑可能会使用它。
In Python 2: always. The default is OS dependent.
在 Python 2 中:总是。默认值取决于操作系统。
And remember: this is just about your source code files. Now in the 3rd millennium the stringtype does not exist anymore. You must take care of the type text, that is a sequence of bytes andan encoding. You'll still have to define your encoding in all input and output operation. These operations will still be dependent on your environment, so it's still better to follow the rule: Explicit is better than implicit.
请记住:这只是关于您的源代码文件。现在到了第三个千年,字符串类型不再存在。您必须处理text类型,即字节序列和编码。您仍然需要在所有输入和输出操作中定义您的编码。这些操作仍将取决于您的环境,因此最好遵循以下规则:显式优于隐式。

