Python bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml。你需要安装解析器库吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24398302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
提问by user3773048
...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorialto get Beautiful Soup and lxml, which both installed successfully and work with a separate test file located here. In the Python script that causes this error, I have included this line:
from pageCrawler import comparePages
And in the pageCrawler file I have included the following two lines:
from bs4 import BeautifulSoup
from urllib2 import urlopen
我的终端上的上述输出。我使用的是 Mac OS 10.7.x。我有 Python 2.7.1,并按照本教程获得 Beautiful Soup 和 lxml,它们都成功安装并使用位于此处的单独测试文件。在导致此错误的 Python 脚本中,我包含了这一行:
from pageCrawler import comparePages
在 pageCrawler 文件中,我包含了以下两行:
from bs4 import BeautifulSoup
from urllib2 import urlopen
Any help in figuring out what the problem is and how it can be solved would much be appreciated.
任何帮助弄清楚问题是什么以及如何解决它都将不胜感激。
回答by James Errico
I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you're like me (on OSX) you might be stuck with something that requires a bit of work:
我怀疑这与 BS 用于读取 HTML 的解析器有关。他们的文档在这里,但如果你像我一样(在 OSX 上),你可能会遇到一些需要一些工作的事情:
You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.
您会注意到,在上面的 BS4 文档页面中,他们指出默认情况下 BS4 将使用 Python 内置的 HTML 解析器。假设您使用的是 OSX,Apple 捆绑的 Python 版本是 2.7.2,这对字符格式不宽松。我遇到了同样的问题,所以我升级了我的 Python 版本来解决它。在 virtualenv 中执行此操作将最大限度地减少对其他项目的干扰。
If doing that sounds like a pain, you can switch over to the LXML parser:
如果这样做听起来很痛苦,您可以切换到 LXML 解析器:
pip install lxml
And then try:
然后尝试:
soup = BeautifulSoup(html, "lxml")
Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packagesfairly easily.
根据您的情况,这可能已经足够了。我发现这很烦人,需要升级我的 Python 版本。使用 virtualenv,您可以相当轻松地迁移您的软件包。
回答by Tim Seed
For basic out of the box python with bs4 installed then you can process your xml with
对于安装了 bs4 的基本开箱即用 python,您可以使用以下命令处理您的 xml
soup = BeautifulSoup(html, "html5lib")
If however you want to use formatter='xml'then you need to
但是,如果您想使用formatter='xml'那么您需要
pip3 install lxml
soup = BeautifulSoup(html, features="xml")
回答by Qiao Yang
I encountered the same issue. I found the reason is that I had a slightly-outdated python six package.
我遇到了同样的问题。我发现原因是我有一个稍微过时的python 6包。
>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
from .html5parser import HTMLParser, parse, parseFragment
File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys
Upgrading your six package will solve the issue:
升级你的六个包将解决这个问题:
sudo pip install six=1.10.0
回答by Ernst
I preferred built in python html parser, no install no dependencies
我更喜欢内置的python html解析器,没有安装没有依赖
soup = BeautifulSoup(s, "html.parser")
soup = BeautifulSoup(s, "html.parser")
回答by Bashar
I am using Python 3.6and I had the same original error in this post. After I ran the command:
我正在使用Python 3.6,并且在这篇文章中出现了相同的原始错误。在我运行命令后:
python3 -m pip install lxml
it resolved my problem
它解决了我的问题
回答by Yogesh
Instead of using lxml use html.parser, you can use this piece of code:
而不是使用 lxml 使用 html.parser,你可以使用这段代码:
soup = BeautifulSoup(html, 'html.parser')
回答by Projesh Bhoumik
Although BeautifulSoup supports the HTML parser by default If you want to use any other third-party Python parsers you need to install that external parser like(lxml).
虽然 BeautifulSoup 默认支持 HTML 解析器,如果你想使用任何其他第三方 Python 解析器,你需要安装外部解析器,如 (lxml)。
soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser
But if you don't specified any parser as parameter you will get an warning that no parser specified.
但是如果你没有指定任何解析器作为参数,你会得到一个没有指定解析器的警告。
soup_object= BeautifulSoup(markup) #Warnning
To use any other external parser you need to install it and then need to specify it. like
要使用任何其他外部解析器,您需要安装它,然后需要指定它。喜欢
pip install lxml
soup_object= BeautifulSoup(markup,'lxml') # C dependent parser
External parser have c and python dependency which may have some advantage and disadvantage.
外部解析器有 c 和 python 依赖,这可能有一些优点和缺点。
回答by abhishekPakrashi
In some references, use the second instead of the first:
在某些参考文献中,使用第二个而不是第一个:
soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')
回答by Pranav Bhendawade
The error is coming because of the parser you are using. In general, if you have HTML file/code then you need to use html5lib
(documentation can be found here) & in-case you have XML file/data then you need to use lxml
(documentation can be found here). You can use lxml
for HTML file/code also but sometimes it gives an error as above. So, better to choose the package wisely based on the type of data/file. You can also use html_parser
which is built-in module. But, this also sometimes do not work.
由于您正在使用的解析器,错误即将到来。一般来说,如果您有 HTML 文件/代码,那么您需要使用html5lib
(文档可以在这里找到),如果您有 XML 文件/数据,那么您需要使用lxml
(文档可以在这里找到)。您也可以lxml
用于 HTML 文件/代码,但有时会出现上述错误。因此,最好根据数据/文件的类型明智地选择包。您还可以使用html_parser
which 是内置模块。但是,这有时也不起作用。
For more details regarding when to use which package you can see the details here
有关何时使用哪个包的更多详细信息,您可以在此处查看详细信息
回答by Pikamander2
Run these three commands to make sure that you have all the relevant packages installed:
运行这三个命令以确保您安装了所有相关的软件包:
pip install bs4
pip install html5lib
pip install lxml
Then restart your Python IDE, if needed.
如果需要,然后重新启动 Python IDE。
That should take care of anything related to this issue.
这应该处理与此问题相关的任何事情。