Python UTF-8 无法在 32 位机器上解码字节
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2562674/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python UTF-8 can't decode byte on 32-bit machine
提问by JiminyCricket
it works fine on 64 bit machines but for some reason will not work on python 2.4.3 on a 32-bit instance.
它在 64 位机器上运行良好,但由于某种原因不能在 32 位实例上的 python 2.4.3 上运行。
i get the error
我得到了错误
'utf8' codec can't decode bytes in position 76-79: invalid data
for the code
对于代码
try:
str(sourceresult.sourcename).encode('utf8','replace')
except:
raise Exception( repr(sourceresult.sourcename ) )
it returns 'kazamidori blog\xf9'
它返回 'kazamidori blog\xf9'
i have modified my site.py file to make UTF8 the default encoding, but still doesnt seem to be working.
我已经修改了我的 site.py 文件,使 UTF8 成为默认编码,但似乎仍然无法正常工作。
回答by tzot
We need the following, and we need the exactoutput:
我们需要以下内容,我们需要确切的输出:
type(sourceresult.sourcename) # I suspect it's already a UTF-8 encoded string
repr(sourceresult.sourcename)
Like I said, I'm almost certain that your sourceresult.sourcename
is already a UTF-8 encoded string.
就像我说的,我几乎可以肯定你sourceresult.sourcename
已经是一个 UTF-8 编码的字符串。
Perhaps thismight help a little.
也许这可能会有所帮助。
EDIT: it seems your sourceresult.sourcename
is encoded as cp1252. I don't know what mystring
(that you reference in a comment) is.
So, to get a UTF-8 encoded string, you need to do:
编辑:似乎您sourceresult.sourcename
的编码为 cp1252。我不知道mystring
(你在评论中提到的)是什么。因此,要获得 UTF-8 编码的字符串,您需要执行以下操作:
source_as_UTF8= sourceresult.sourcename.decode("cp1252").encode("utf-8")
However, the string being cp1252-encoded is notconsistent with the error message you supplied.
然而,字符为CP1252编码是不是您所提供的错误信息是一致的。
回答by Pekka
"Invalid Data" usually means that the incoming data contained characters outside its character set.
“无效数据”通常意味着传入的数据包含其字符集之外的字符。
This is often caused by, at some point, some data being encoded in a character set different than UTF-8.
这通常是由于某些数据在某些时候使用不同于 UTF-8 的字符集进行编码造成的。
For example, if the file a string is stored in was not converted into UTF-8 when you made UTF-8 the standard character set. (In Windows, you can usually specify a file's encoding in the "Save as..." dialog of your text editor)
例如,如果将 UTF-8 设为标准字符集时,存储字符串的文件未转换为 UTF-8。(在 Windows 中,您通常可以在文本编辑器的“另存为...”对话框中指定文件的编码)
Or, when data comes from a database that uses a different character set in either the tables, the connection, or both.
或者,当数据来自在表、连接或两者中使用不同字符集的数据库时。
Check out where the data comes from, and what encodings are set along the way.
检查数据的来源,以及沿途设置的编码。
回答by DNS
I think the problem is with your use of the str() function. Keep in mind that str() returns narrow, i.e. 1-byte-per-character strings. If the input, sourceresult.sourcename, is unicode, then Python automatically encodes it in order to return a narrow string. By default it uses the system encoding, which is likely something like ISO-8859-1, to do this.
我认为问题在于您对 str() 函数的使用。请记住, str() 返回窄字符串,即每字符 1 个字节的字符串。如果输入 sourceresult.sourcename 是 unicode,那么 Python 会自动对其进行编码以返回一个窄字符串。默认情况下,它使用系统编码(可能类似于 ISO-8859-1)来执行此操作。
So you're getting the error because it doesn't make sense to call encode on a string that is already encoded. If you get rid of the str(), it should work.
所以你会收到错误,因为在已经编码的字符串上调用 encode 没有意义。如果你摆脱了 str(),它应该可以工作。
回答by Johnny O
Make sure you don't have an odd number of bytes in your varchar field; I had a varchar(255) that blew up when someone entered a long string in Arabic. I then got the "unexpected end of data" error (as one might expect...!)
确保 varchar 字段中没有奇数字节;当有人用阿拉伯语输入长字符串时,我有一个 varchar(255) 爆炸。然后我得到了“意外的数据结束”错误(正如人们所料......!)