python中的中文和日文字符支持

Question

提问by user2030113

How to read correctly japanese and chinese characters. I'm using python 2.5. Output is displayed as "E:\Test\?????????"

如何正确阅读日文和汉字。我正在使用 python 2.5。输出显示为"E:\Test\?????????"

path = r"E:\Test\は最高のプログラマ"
t = path.encode()
print t
u = path.decode()
print u
t = path.encode("utf-8")
print t
t = path.decode("utf-8")
print t

Answer 1

回答by m.brindley

You should force the string to be a unicodeobject like

你应该强制字符串是一个unicode对象

path = ur"E:\Test\は最高のプログラマ"

Docs on string literals relevant to 2.5 are located here

与 2.5 相关的字符串文字文档位于此处

Edit:I'm not positive on if the object is a unicodein 2.5 but the docs do state that \uXXXX[XXXX]will be processed and the the string will be "a Unicode string".

编辑：我不肯定对象是否是unicode2.5 中的，但文档确实声明\uXXXX[XXXX]将被处理并且字符串将是“Unicode 字符串”。

Answer 2

回答by Martijn Pieters

Please do read the Python Unicode HOWTO; it explains how to process and include non-ASCII text in your Python code.

请阅读Python Unicode HOWTO；它解释了如何在 Python 代码中处理和包含非 ASCII 文本。

If you want to include Japanese text literals in your code, you have several options:

如果您想在代码中包含日语文本文字，您有多种选择：

Use unicode literals (create unicodeobjects instead of byte strings), but any non-ascii codepoint is represented by a unicode escape character. They take the form of \uabcd, so a backslash, a uand 4 hexadecimal digits:
```
ru = u'\u30EB'
```
would be one character, the katakana 'ru' codepoint ('ル').
Use unicode literals, but include the characters in some form of encoding. Your text editor will save files in a given encoding (say, UTF-16); you need to declare that encoding at the top of the source file:
```
# encoding: utf-16

ru = u'ル'
```
where 'ル' is included without using an escape. The default encoding for Python 2 files is ASCII, so by declaring an encoding you make it possible to use Japanese directly.
Use byte string literals, ready encoded. Encode the codepoints by some other means and include them in your byte string literals. If all you are going to do with them is use them in encoded form anyway, this should be fine:
```
ru = '\xeb\x30'  # ru encoded to UTF16 little-endian
```
I encoded 'ル' to UTF-16 little-endian because that's the default Windows NTFS filename encoding.

使用 unicode 文字（创建unicode对象而不是字节字符串），但任何非 ascii 代码点都由 unicode 转义字符表示。它们采用\uabcd, 所以反斜杠、au和 4 个十六进制数字的形式：
```
ru = u'\u30EB'
```
将是一个字符，片假名“ru”代码点（“ル”）。
使用 unicode 文字，但以某种编码形式包含字符。您的文本编辑器将以给定的编码（例如 UTF-16）保存文件；您需要在源文件的顶部声明该编码：
```
# encoding: utf-16

ru = u'ル'
```
其中包含 'ル' 而不使用转义符。Python 2 文件的默认编码是 ASCII，因此通过声明编码，您可以直接使用日语。
使用字节字符串文字，准备好编码。通过其他方式对代码点进行编码，并将它们包含在您的字节字符串文字中。如果您打算对它们做的只是以编码形式使用它们，那么这应该没问题：
```
ru = '\xeb\x30'  # ru encoded to UTF16 little-endian
```
我将“ル”编码为 UTF-16 little-endian，因为这是默认的 Windows NTFS 文件名编码。

Next problem will be your terminal, the Windows console is notorious for not supporting many character sets out of the box. You probably want to configure it to handle UTF-8 instead. See this questionfor some details, but you need to run the following command in the console:

下一个问题将是您的终端，Windows 控制台因不支持许多开箱即用的字符集而臭名昭著。您可能希望将其配置为处理 UTF-8。有关详细信息，请参阅此问题，但您需要在控制台中运行以下命令：

chcp 65001

to switch to UTF-8, and you may need to switch to a console font that can handle your codepoints (Lucida perhaps?).

要切换到 UTF-8，您可能需要切换到可以处理代码点的控制台字体（也许是 Lucida？）。

Answer 3

回答by jfs

There are two independent issues:

有两个独立的问题：

You should specify Python source encoding if you use non-ascii characters and use Unicode literals for data that represents text e.g.:
```
# -*- coding: utf-8 -*-
path = ur"E:\Test\は最高のプログラマ"
```
Printing Unicode to Windows console is complicatedbut if you set correct font then just:
```
print path
```
might work.

如果您使用非 ascii 字符并为表示文本的数据使用 Unicode 文字，则应指定 Python 源编码，例如：
```
# -*- coding: utf-8 -*-
path = ur"E:\Test\は最高のプログラマ"
```
将 Unicode 打印到 Windows 控制台很复杂，但如果您设置了正确的字体，则只需：
```
print path
```
可能工作。

Regardless of whether your console can display the path; it should be fine to pass the Unicode path to filesystem functions e.g.:

不管你的控制台能否显示路径；将 Unicode 路径传递给文件系统函数应该没问题，例如：

entries = os.listdir(path)

Don't call .encode(char_enc)on bytestrings, call it on Unicode strings instead.
Don't call .decode(char_enc)on Unicode strings, call it on bytestrings instead.

不要调用.encode(char_enc)字节串，而是在 Unicode 字符串上调用它。
不要调用.decode(char_enc)Unicode 字符串，而是调用字节串。

python中的中文和日文字符支持

提问by user2030113

回答by m.brindley

回答by Martijn Pieters

回答by jfs

相关推荐

最近更新

标签

python中的中文和日文字符支持

提问by user2030113

回答by m.brindley

回答by Martijn Pieters

回答by jfs

相关推荐

Python “尝试在非包中进行相对导入”，尽管在一个目录中包含 __init__.py 的包

ubuntu /usr/bin/env: python: 没有那个文件或目录

Python 如何手动从 shell 运行 celery 定期任务？

检查python的列表中是否已经存在一个数字

相关推荐

最近更新

标签

Python “尝试在非包中进行相对导入”，尽管在一个目录中包含 init.py 的包