windows lxml中的解析函数出错

Question

提问by silentNinJa

i have installed lxml2.2.2 on windows platform(i m using python version 2.6.5).i tried this simple command:

我已经在 windows 平台上安装了 lxml2.2.2（我使用的是 python 版本 2.6.5）。我尝试了这个简单的命令：

from lxml.html import parse 
p= parse(‘http://www.google.com').getroot()

but i am getting the following error:

但我收到以下错误：

Traceback (most recent call last):
File “”, line 1, in p=parse(‘http://www.google.com').getroot()
File “C:\Python26\lib\site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\html_init_.py”, line 661, in parse return etree.parse(filenameorurl, parser, baseurl=baseurl, **kw) 
File “lxml.etree.pyx”, line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590) 
File “parser.pxi”, line 1491, in lxml.etree.parseDocument (src/lxml/lxml.etree.c:71205) File “parser.pxi”, line 1520, in lxml.etree.parseDocumentFromURL (src/lxml/lxml.etree.c:71488) 
File “parser.pxi”, line 1420, in lxml.etree.parseDocFromFile (src/lxml/lxml.etree.c:70583)
File “parser.pxi”, line 975, in lxml.etree.BaseParser.parseDocFrom
File (src/lxml/lxml.etree.c:67736)
File “parser.pxi”, line 539, in lxml.etree.ParserContext.handleParseResultDoc (src/lxml/lxml.etree.c:63820) 
File “parser.pxi”, line 625, in lxml.etree.handleParseResult (src/lxml/lxml.etree.c:64741)
File “parser.pxi”, line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056)
IOError: Error reading file ‘http://www.google.com': failed to load external entity “http://www.google.com”

i am clueless as to what to do next as i am a newbie to python. please guide me to solve this error. thanks in advance!! :)

我不知道下一步该做什么，因为我是 Python 的新手。请指导我解决这个错误。提前致谢！！:)

Answer 1

回答by MattH

~~lxml.html.parsedoes not fetch URLs.~~

~~lxml.html.parse不获取 URL。~~

Here's how to do it with urllib2:

以下是使用 urllib2 执行此操作的方法：

>>> from urllib2 import urlopen
>>> from lxml.html import parse
>>> page = urlopen('http://www.google.com')
>>> p = parse(page)
>>> p.getroot()
<Element html at 1304050>

Update
Steven is right. lxml.etree.parseshould accept and load URLs. I missed that. I've tried deleting this answer, but I'm not allowed.

更新
史蒂文是对的。lxml.etree.parse应该接受并加载 URL。我错过了。我试过删除这个答案，但我不被允许。

I retract my statement about it not fetching URLs.

我收回我关于它不获取 URL 的声明。

Answer 2

回答by Steven

According to the api docs it should work: http://lxml.de/api/lxml.html-module.html#parse

根据 api 文档它应该可以工作：http: //lxml.de/api/lxml.html-module.html#parse

This seems to be a bug in lxml 2.2.2. I just tested on windows with python 2.6 and 2.7 and it does work with 2.3.0.

这似乎是 lxml 2.2.2 中的一个错误。我刚刚在使用 python 2.6 和 2.7 的 Windows 上进行了测试，它确实适用于 2.3.0。

So: upgrade your lxml and you'll be fine.

所以：升级你的 lxml，你会没事的。

I don't know exactly in which versions of lxml the problem occurs, but I believe the problem was not so much with lxml itself, but with the version of libxml2 used to build the windows binary. (certain versions of libxml2 had a problem with http on windows)

我不知道问题发生在哪个版本的 lxml 中，但我相信问题不在于 lxml 本身，而在于用于构建 Windows 二进制文件的 libxml2 版本。（某些版本的 libxml2 在 windows 上有 http 问题）

Answer 3

回答by bmaupin

Since line breaks are not allowed in comments, here's my implementation of MattH's answer:

由于评论中不允许换行，这是我对MattH 的回答的实现：

from urllib2 import urlopen
from lxml.html import parse

site_url = ('http://www.google.com')

try:
    page = parse(site_url).getroot()
except IOError:
    page = parse(urlopen(site_url)).getroot()

windows lxml中的解析函数出错

提问by silentNinJa

回答by MattH

回答by Steven

回答by bmaupin

相关推荐

最近更新

标签

windows lxml中的解析函数出错

提问by silentNinJa

回答by MattH

回答by Steven

回答by bmaupin

相关推荐

windows 如何将一些字符串粘贴到 Python 中的活动窗口？

“Android 创建”调用在 Windows 7 中失败 - 缺少 JDK

windows 以编程方式启用/禁用连接

windows Visual Studio 2010——如何减少其内存占用

相关推荐

最近更新

标签