切换到 Python 3 导致 UnicodeDecodeError

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23917729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:40:35  来源:igfitidea点击:

Switching to Python 3 causing UnicodeDecodeError

pythonpython-3.xencoding

提问by 3yakuya

I've just added Python3 interpreter to Sublime, and the following code stopped working:

我刚刚将 Python3 解释器添加到 Sublime,并且以下代码停止工作:

for directory in directoryList:
    fileList = os.listdir(directory)
    for filename in fileList:
        filename = os.path.join(directory, filename)
        currentFile = open(filename, 'rt')
        for line in currentFile:               ##Here comes the exception.
            currentLine = line.split(' ')
            for word in currentLine:
                if word.lower() not in bigBagOfWords:
                    bigBagOfWords.append(word.lower())
        currentFile.close()

I get a following exception:

我收到以下异常:

  File "/Users/Kuba/Desktop/DictionaryCreator.py", line 11, in <module>
    for line in currentFile:
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 305: ordinal not in range(128)

I found this rather strange, because as far as I know Python3 is supposed to support utf-8 everywhere. What's more, the same exact code works with no problems on Python2.7. I've read about adding environmental variable PYTHONIOENCODING, but I tried it - to no avail (however, it appears it is not that easy to add an environmental variable in OS X Mavericks, so maybe I did something wrong with adding the variable? I modidified /etc/launchd.conf)

我觉得这很奇怪,因为据我所知,Python3 应该到处都支持 utf-8。更重要的是,完全相同的代码在 Python2.7 上运行没有问题。我已经阅读了关于添加环境变量的内容PYTHONIOENCODING,但我尝试过 - 无济于事(但是,在 OS X Mavericks 中添加环境变量似乎并不容易,所以也许我在添加变量时做错了什么?我修改了/etc/launchd.conf)

采纳答案by Martijn Pieters

Python 3 decodestext files when reading, encodeswhen writing. The default encoding is taken from locale.getpreferredencoding(False), which evidently for your setup returns 'ASCII'. See the open()function documenation:

Python 3在读取时解码文本文件,在写入时编码。默认编码取自locale.getpreferredencoding(False),这显然为您的设置返回'ASCII'。请参阅open()功能文档

In text mode, if encodingis not specified the encoding used is platform dependent: locale.getpreferredencoding(False)is called to get the current locale encoding.

在文本模式下,如果编码未指定使用的编码是与平台相关的:locale.getpreferredencoding(False)被称为获取当前的本地编码。

Instead of relying on a system setting, you should open your text files using an explicit codec:

您应该使用显式编解码器打开文本文件,而不是依赖于系统设置:

currentFile = open(filename, 'rt', encoding='latin1')

where you set the encodingparameter to match the file you are reading.

您在其中设置encoding参数以匹配您正在阅读的文件。

Python 3 supports UTF-8 as the default for source code.

Python 3 支持 UTF-8 作为源代码的默认值。

The same applies to writing to a writeable text file; data written will be encoded, and if you rely on the system encoding you are liable to get UnicodeEncodingErrorexceptions unless you explicitly set a suitable codec. What codec to use when writing depends on what text you are writing and what you plan to do with the file afterward.

这同样适用于写入可写文本文件;写入的数据将被编码,如果您依赖系统编码,UnicodeEncodingError除非您明确设置合适的编解码器,否则您很可能会遇到异常。编写时使用的编解码器取决于您正在编写的文本以及之后您打算对文件做什么。

You may want to read up on Python 3 and Unicode in the Unicode HOWTO, which explains both about source code encoding and reading and writing Unicode data.

您可能想阅读Unicode HOWTO中的 Python 3 和 Unicode ,它解释了源代码编码以及读写 Unicode 数据。

回答by farid khafizov

"as far as I know Python3 is supposed to support utf-8 everywhere ..." Not true. I have python 3.6 and my default encoding is NOT utf-8. To change it to utf-8 in my code I use:

“据我所知,Python3 应该在任何地方都支持 utf-8 ......”不是真的。我有 python 3.6,我的默认编码不是 utf-8。要在我的代码中将其更改为 utf-8,我使用:

import locale
def getpreferredencoding(do_setlocale = True):
   return "utf-8"
locale.getpreferredencoding = getpreferredencoding

as explained in Changing the “locale preferred encoding” in Python 3 in Windows

在 Windows更改 Python 3 中的“区域设置首选编码”中所述

回答by vicky_kqr

In general, I found 3 ways to fix Unicode related Errors in Python3:

总的来说,我找到了 3 种方法来修复 Python3 中与 Unicode 相关的错误:

  1. Use the encoding explicitly like currentFile = open(filename, 'rt',encoding='utf-8')

  2. As the bytes have no encoding, convert the string data to bytes before writing to file like data = 'string'.encode('utf-8')

  3. Especially in Linux environment, check $LANG. Such issue usually arises when LANG=C which makes default encoding as 'ascii' instead of 'utf-8'. One can change it with other appropriate value like LANG='en_IN'

  1. 明确使用编码,如 currentFile = open(filename, 'rt',encoding='utf-8')

  2. 由于字节没有编码,在写入文件之前将字符串数据转换为字节,如 data = 'string'.encode('utf-8')

  3. 特别是在 Linux 环境中,检查 $LANG。当 LANG=C 将默认编码设为 'ascii' 而不是 'utf-8' 时,通常会出现此类问题。可以使用其他适当的值更改它,例如 LANG='en_IN'