Python 不支持带有编码声明的 XML Unicode 字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15830421/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
XML Unicode strings with encoding declaration are not supported
提问by Prometheus
Trying to do the following...
尝试执行以下操作...
from lxml import etree
from lxml.etree import fromstring
if request.POST:
parser = etree.XMLParser(ns_clean=True, recover=True)
h = fromstring(request.POST['xml'], parser=parser)
return HttpResponse(h.cssselect('itagg_delivery_receipt status').text_content())
but it give this error:
但它给出了这个错误:
[Fri Apr 05 10:27:54 2013] [error] Internal Server Error: /sms/status_postback/
[Fri Apr 05 10:27:54 2013] [error] Traceback (most recent call last):
[Fri Apr 05 10:27:54 2013] [error] File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
[Fri Apr 05 10:27:54 2013] [error] response = callback(request, *callback_args, **callback_kwargs)
[Fri Apr 05 10:27:54 2013] [error] File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 77, in wrapped_view
[Fri Apr 05 10:27:54 2013] [error] return view_func(*args, **kwargs)
[Fri Apr 05 10:27:54 2013] [error] File "/srv/project/livewireSMS/sms/views.py", line 42, in update_delivery_status
[Fri Apr 05 10:27:54 2013] [error] h = fromstring(request.POST['xml'], parser=parser)
[Fri Apr 05 10:27:54 2013] [error] File "lxml.etree.pyx", line 2754, in lxml.etree.fromstring (src/lxml/lxml.etree.c:54631)
[Fri Apr 05 10:27:54 2013] [error] File "parser.pxi", line 1569, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)
[Fri Apr 05 10:27:54 2013] [error] ValueError: Unicode strings with encoding declaration are not supported.
this is the XML
这是 XML
<?xml version="1.1" encoding="ISO-8859-1"?>
<itagg_delivery_receipt>
<version>1.0</version>
<msisdn>447889000000</msisdn>
<submission_ref>
845tgrgsehg394g3hdfhhh56445y7ts6</
submission_ref>
<status>Delivered</status>
<reason>4</reason>
<timestamp>20050709120945</timestamp>
<retry>0</retry>
</itagg_delivery_receipt>
I don't have control over the xml document this comes from the SMS company.
我无法控制来自 SMS 公司的 xml 文档。
采纳答案by Pavel Anossov
You'll have to encode it and then force the same encoding in the parser:
您必须对其进行编码,然后在解析器中强制使用相同的编码:
from lxml import etree
from lxml.etree import fromstring
if request.POST:
xml = request.POST['xml'].encode('utf-8')
parser = etree.XMLParser(ns_clean=True, recover=True, encoding='utf-8')
h = fromstring(xml, parser=parser)
return HttpResponse(h.cssselect('delivery_reciept status').text_content())
回答by Aykut Kllic
The following solution from kerncworked for me:
>>> from lxml import etree
>>> xml = u'<?xml version="1.0" encoding="utf-8" ?><foo><bar/></foo>'
>>> xml = bytes(bytearray(xml, encoding='utf-8')) # ADDENDUM OF THIS LINE (when unicode means utf-8, e.g. on Linux)
>>> etree.XML(xml)
<Element html at 0x5b44c90>
回答by Ryan
More simple than answers above:
比上面的答案更简单:
from lxml import etree
#Do request for data, response = r#
data = etree.fromstring(bytes(r.text, encoding='utf-8'))

