pandas 散景“utf8”编解码器无法解码字节 0xe9:数据意外结束
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47405628/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bokeh 'utf8' codec can't decode byte 0xe9 : unexpected end of data
提问by Rags Gupta
Im using Bokeh to plot a pandas Dataframe. Following is the code:
我使用 Bokeh 来绘制Pandas数据框。以下是代码:
map_options = GMapOptions(lat=19.075984, lng=72.877656, map_type="roadmap", zoom=11)
plot = GMapPlot(x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options)
plot.api_key = "xxxxx"
source = ColumnDataSource(
data=dict(
lat=[float(i) for i in data.lat],
lon=[float(i) for i in data.lon],
size=[int(i)/1000 for i in data['count']],
ID = [i for i in data.merchant_id],
Merchant = [str(i) for i in data.merchant_name],
count = [float(i) for i in data['count']]
)
)
hover = HoverTool(tooltips=[
("(x,y)", "($lat, $lon)"),
("ID", "$ID"),
("Name", "@Merchant"),
("count","$count")
])
# hover.renderers.append(circle_glyph)
plot.tools.append(hover)
circle = Circle(x="lon", y="lat", size='size', fill_color="blue", fill_alpha=0.8, line_color=None)
plot.add_glyph(source, circle)
# plot.add_layout(labels)
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())
output_file("gmap_plot.html")
show(plot)
In the Hovertool using the "Name" field throws the following error:
在 Hovertool 中使用“名称”字段会引发以下错误:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 6: unexpected end of data
UnicodeDecodeError: 'utf8' 编解码器无法解码位置 6 中的字节 0xe9:数据意外结束
Also commenting the "Name" field still gives me the error but there is an output plot.
还评论“名称”字段仍然给我错误,但有一个输出图。
Following is the dataframe I'm using:
以下是我正在使用的数据框:
lat lon merchant_id count merchant_name
0 18.539971 73.893963 757 777 Portobello
1 18.565766 73.910980 745 10193 The Wok Box
2 18.815427 76.775143 1058 2354 Burrito Factory
3 18.914633 72.817916 87 1985 Flamboyante
4 18.915794 72.824370 94 1116 Butterfly Pond
5 18.916473 72.826868 145 1010 Leo's Boulangerie
6 18.918923 72.828325 115 517 Brijwasi Sweets
7 18.928063 72.832888 973 613 Pandora's Box
8 18.928562 72.832353 101 64 La Folie Patisserie
9 18.929516 72.831860 961 6673 Burma Burma
From my knowledge, the merchant name has characters that's causing the error, but i've tried encoding the column with 'utf-8', 'ascii', etc. But I get the following error:
据我所知,商家名称包含导致错误的字符,但我尝试使用“utf-8”、“ascii”等对列进行编码。但出现以下错误:
data['merchant_name'] = data['merchant_name'].str.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 6: ordinal not in range(128)
UnicodeDecodeError:“ascii”编解码器无法解码位置 6 中的字节 0xe9:序号不在范围内(128)
Any Idea on how to proceed ?
关于如何进行的任何想法?
回答by xhancar
The byte 0xe9 is not in pure ascii, because it is 233 (in decadical system) and ascii has only 127 symbols. In UTF-8 it is a special byte, which introduces a charecter taking next two bytes. Thus the string is probably in another encoding. For example in latin1 and latin2 the byte 0xe9 symbolizes the letter é.
字节 0xe9 不是纯 ascii,因为它是 233(在十进制系统中)而 ascii 只有 127 个符号。在 UTF-8 中,它是一个特殊字节,它引入了一个字符,占用接下来的两个字节。因此该字符串可能采用另一种编码。例如,在 latin1 和 latin2 中,字节 0xe9 表示字母 é。
And remember, first you must decode the string. You tried encode the type str, (normal string) which does not make sense. Therefore Python tried his default decode('ascii')
and you got the UnicodeDecodeError
on encode
method.
请记住,首先您必须对字符串进行解码。您尝试对类型 str, (普通字符串)进行编码,这是没有意义的。因此,Python 尝试了他的默认设置decode('ascii')
,您得到了UnicodeDecodeError
onencode
方法。
I didn't manage to replicate the error and also I don't see any special characters in the data you provided (especially I don't see the 0xe9 byte). So I can only guess. I would try something like this:
我没有设法复制错误,而且我在您提供的数据中没有看到任何特殊字符(尤其是我没有看到 0xe9 字节)。所以我只能猜测。我会尝试这样的事情:
data['merchant_name'] = data['merchant_name'].str.decode('latin1').encode('utf-8')
And last but not least please please please, when you post your code, post the complete code with all imports and everything. I never used Bokeh, and now, when I tried to replicate your error, it was time consuming to reconstruct them. (But anyway -- at the end I managed to import everything, but I didn't get your error.)
最后但并非最不重要的请拜托,当您发布代码时,请发布包含所有导入和所有内容的完整代码。我从未使用过 Bokeh,现在,当我尝试复制您的错误时,重建它们非常耗时。(但无论如何 - 最后我设法导入了所有内容,但我没有收到您的错误。)