python Beautiful Soup Unicode 编码错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2627071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Beautiful Soup Unicode encode error
提问by Rohit Banga
I am trying the following code with a particular HTML file
我正在使用特定的 HTML 文件尝试以下代码
from BeautifulSoup import BeautifulSoup
import re
import codecs
import sys
f = open('test1.html')
html = f.read()
soup = BeautifulSoup(html)
body = soup.body.contents
para = soup.findAll('p')
print str(para).encode('utf-8')
I get the following error:
我收到以下错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 9: ordinal not in range(128)
How do I debug this?
我该如何调试?
I do not get any error when I remove the call to print function.
当我删除对打印功能的调用时,我没有收到任何错误。
采纳答案by gimel
The str(para)
builtin is trying to use the default (ascii
) encoding for the unicode in para
.
This is done beforethe encode()
call:
该str(para)
内置试图使用默认的(ascii
)编码中的Unicode para
。这是在encode()
调用之前完成的:
>>> s=u'123\u2019'
>>> str(s)
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 3: ordinal not in range(128)
>>> s.encode("utf-8")
'123\xe2\x80\x99'
>>>
Try encoding para
directly, maybe by applying encode("utf-8")
to each list element.
尝试para
直接编码,也许通过应用于encode("utf-8")
每个列表元素。