Python UTF-8 无法在 32 位机器上解码字节

Question

提问by JiminyCricket

it works fine on 64 bit machines but for some reason will not work on python 2.4.3 on a 32-bit instance.

它在 64 位机器上运行良好，但由于某种原因不能在 32 位实例上的 python 2.4.3 上运行。

i get the error

我得到了错误

'utf8' codec can't decode bytes in position 76-79: invalid data

for the code

对于代码

try:        
    str(sourceresult.sourcename).encode('utf8','replace')
except:
    raise Exception(  repr(sourceresult.sourcename ) )

it returns 'kazamidori blog\xf9'

它返回 'kazamidori blog\xf9'

i have modified my site.py file to make UTF8 the default encoding, but still doesnt seem to be working.

我已经修改了我的 site.py 文件，使 UTF8 成为默认编码，但似乎仍然无法正常工作。

Answer 1

回答by tzot

We need the following, and we need the exactoutput:

我们需要以下内容，我们需要确切的输出：

type(sourceresult.sourcename) # I suspect it's already a UTF-8 encoded string

repr(sourceresult.sourcename)

Like I said, I'm almost certain that your sourceresult.sourcenameis already a UTF-8 encoded string.

就像我说的，我几乎可以肯定你sourceresult.sourcename已经是一个 UTF-8 编码的字符串。

Perhaps thismight help a little.

也许这可能会有所帮助。

EDIT: it seems your sourceresult.sourcenameis encoded as cp1252. I don't know what mystring(that you reference in a comment) is. So, to get a UTF-8 encoded string, you need to do:

编辑：似乎您sourceresult.sourcename的编码为 cp1252。我不知道mystring（你在评论中提到的）是什么。因此，要获得 UTF-8 编码的字符串，您需要执行以下操作：

source_as_UTF8= sourceresult.sourcename.decode("cp1252").encode("utf-8")

However, the string being cp1252-encoded is notconsistent with the error message you supplied.

然而，字符为CP1252编码是不是您所提供的错误信息是一致的。

Answer 2

回答by Pekka

"Invalid Data" usually means that the incoming data contained characters outside its character set.

“无效数据”通常意味着传入的数据包含其字符集之外的字符。

This is often caused by, at some point, some data being encoded in a character set different than UTF-8.

这通常是由于某些数据在某些时候使用不同于 UTF-8 的字符集进行编码造成的。

For example, if the file a string is stored in was not converted into UTF-8 when you made UTF-8 the standard character set. (In Windows, you can usually specify a file's encoding in the "Save as..." dialog of your text editor)

例如，如果将 UTF-8 设为标准字符集时，存储字符串的文件未转换为 UTF-8。（在 Windows 中，您通常可以在文本编辑器的“另存为...”对话框中指定文件的编码）

Or, when data comes from a database that uses a different character set in either the tables, the connection, or both.

或者，当数据来自在表、连接或两者中使用不同字符集的数据库时。

Check out where the data comes from, and what encodings are set along the way.

检查数据的来源，以及沿途设置的编码。

Answer 3

回答by DNS

I think the problem is with your use of the str() function. Keep in mind that str() returns narrow, i.e. 1-byte-per-character strings. If the input, sourceresult.sourcename, is unicode, then Python automatically encodes it in order to return a narrow string. By default it uses the system encoding, which is likely something like ISO-8859-1, to do this.

我认为问题在于您对 str() 函数的使用。请记住， str() 返回窄字符串，即每字符 1 个字节的字符串。如果输入 sourceresult.sourcename 是 unicode，那么 Python 会自动对其进行编码以返回一个窄字符串。默认情况下，它使用系统编码（可能类似于 ISO-8859-1）来执行此操作。

So you're getting the error because it doesn't make sense to call encode on a string that is already encoded. If you get rid of the str(), it should work.

所以你会收到错误，因为在已经编码的字符串上调用 encode 没有意义。如果你摆脱了 str()，它应该可以工作。

Answer 4

回答by Johnny O

Make sure you don't have an odd number of bytes in your varchar field; I had a varchar(255) that blew up when someone entered a long string in Arabic. I then got the "unexpected end of data" error (as one might expect...!)

确保 varchar 字段中没有奇数字节；当有人用阿拉伯语输入长字符串时，我有一个 varchar(255) 爆炸。然后我得到了“意外的数据结束”错误（正如人们所料......！）

Python UTF-8 无法在 32 位机器上解码字节

提问by JiminyCricket

回答by tzot

回答by Pekka

回答by DNS

回答by Johnny O

相关推荐

最近更新

标签

Python UTF-8 无法在 32 位机器上解码字节

提问by JiminyCricket

回答by tzot

回答by Pekka

回答by DNS

回答by Johnny O

相关推荐

使用正则表达式来确保字符串是字母数字加号的 Python 代码。- _

python 定义类时在python中设置具有给定名称的类属性

python Zope 接口的目的？

python 预测事件顺序的机器学习算法？

相关推荐

最近更新

标签