pandas 散景“utf8”编解码器无法解码字节 0xe9:数据意外结束

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47405628/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:48:18  来源:igfitidea点击:

Bokeh 'utf8' codec can't decode byte 0xe9 : unexpected end of data

pythonpandasencodingbokeh

提问by Rags Gupta

Im using Bokeh to plot a pandas Dataframe. Following is the code:

我使用 Bokeh 来绘制Pandas数据框。以下是代码:

map_options = GMapOptions(lat=19.075984, lng=72.877656, map_type="roadmap", zoom=11)
plot = GMapPlot(x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options)


plot.api_key = "xxxxx"
source = ColumnDataSource(
    data=dict(
        lat=[float(i) for i in data.lat],
        lon=[float(i) for i in data.lon],
        size=[int(i)/1000 for i in data['count']],
        ID = [i for i in data.merchant_id],
        Merchant = [str(i) for i in data.merchant_name],
        count = [float(i) for i in data['count']]
    )
)
hover = HoverTool(tooltips=[
    ("(x,y)", "($lat, $lon)"),
    ("ID", "$ID"),
    ("Name", "@Merchant"),
    ("count","$count")
])


# hover.renderers.append(circle_glyph)
plot.tools.append(hover)
circle = Circle(x="lon", y="lat", size='size', fill_color="blue", fill_alpha=0.8, line_color=None)
plot.add_glyph(source, circle)

# plot.add_layout(labels)
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())
output_file("gmap_plot.html")
show(plot)

In the Hovertool using the "Name" field throws the following error:

在 Hovertool 中使用“名称”字段会引发以下错误:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 6: unexpected end of data

UnicodeDecodeError: 'utf8' 编解码器无法解码位置 6 中的字节 0xe9:数据意外结束

Also commenting the "Name" field still gives me the error but there is an output plot.

还评论“名称”字段仍然给我错误,但有一个输出图。

Following is the dataframe I'm using:

以下是我正在使用的数据框:

    lat lon merchant_id count   merchant_name
0   18.539971   73.893963   757 777 Portobello
1   18.565766   73.910980   745 10193   The Wok Box
2   18.815427   76.775143   1058    2354    Burrito Factory
3   18.914633   72.817916   87  1985    Flamboyante
4   18.915794   72.824370   94  1116    Butterfly Pond
5   18.916473   72.826868   145 1010    Leo's Boulangerie
6   18.918923   72.828325   115 517 Brijwasi Sweets
7   18.928063   72.832888   973 613 Pandora's Box
8   18.928562   72.832353   101 64  La Folie Patisserie
9   18.929516   72.831860   961 6673    Burma Burma

From my knowledge, the merchant name has characters that's causing the error, but i've tried encoding the column with 'utf-8', 'ascii', etc. But I get the following error:

据我所知,商家名称包含导致错误的字符,但我尝试使用“utf-8”、“ascii”等对列进行编码。但出现以下错误:

data['merchant_name'] = data['merchant_name'].str.encode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 6: ordinal not in range(128)

UnicodeDecodeError:“ascii”编解码器无法解码位置 6 中的字节 0xe9:序号不在范围内(128)

Any Idea on how to proceed ?

关于如何进行的任何想法?

回答by xhancar

The byte 0xe9 is not in pure ascii, because it is 233 (in decadical system) and ascii has only 127 symbols. In UTF-8 it is a special byte, which introduces a charecter taking next two bytes. Thus the string is probably in another encoding. For example in latin1 and latin2 the byte 0xe9 symbolizes the letter é.

字节 0xe9 不是纯 ascii,因为它是 233(在十进制系统中)而 ascii 只有 127 个符号。在 UTF-8 中,它是一个特殊字节,它引入了一个字符,占用接下来的两个字节。因此该字符串可能采用另一种编码。例如,在 latin1 和 latin2 中,字节 0xe9 表示字母 é。

And remember, first you must decode the string. You tried encode the type str, (normal string) which does not make sense. Therefore Python tried his default decode('ascii')and you got the UnicodeDecodeErroron encodemethod.

请记住,首先您必须对字符串进行解码。您尝试对类型 str, (普通字符串)进行编码,这是没有意义的。因此,Python 尝试了他的默认设置decode('ascii'),您得到了UnicodeDecodeErroronencode方法。

I didn't manage to replicate the error and also I don't see any special characters in the data you provided (especially I don't see the 0xe9 byte). So I can only guess. I would try something like this:

我没有设法复制错误,而且我在您提供的数据中没有看到任何特殊字符(尤其是我没有看到 0xe9 字节)。所以我只能猜测。我会尝试这样的事情:

data['merchant_name'] = data['merchant_name'].str.decode('latin1').encode('utf-8')

And last but not least please please please, when you post your code, post the complete code with all imports and everything. I never used Bokeh, and now, when I tried to replicate your error, it was time consuming to reconstruct them. (But anyway -- at the end I managed to import everything, but I didn't get your error.)

最后但并非最不重要的请拜托,当您发布代码时,请发布包含所有导入和所有内容的完整代码。我从未使用过 Bokeh,现在,当我尝试复制您的错误时,重建它们非常耗时。(但无论如何 - 最后我设法导入了所有内容,但我没有收到您的错误。)