Python bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml。你需要安装解析器库吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24398302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:32:25  来源:igfitidea点击:

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

pythonpython-2.7beautifulsouplxml

提问by user3773048

...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorialto get Beautiful Soup and lxml, which both installed successfully and work with a separate test file located here. In the Python script that causes this error, I have included this line: from pageCrawler import comparePagesAnd in the pageCrawler file I have included the following two lines: from bs4 import BeautifulSoupfrom urllib2 import urlopen

我的终端上的上述输出。我使用的是 Mac OS 10.7.x。我有 Python 2.7.1,并按照本教程获得 Beautiful Soup 和 lxml,它们都成功安装并使用位于此处的单独测试文件。在导致此错误的 Python 脚本中,我包含了这一行: from pageCrawler import comparePages在 pageCrawler 文件中,我包含了以下两行: from bs4 import BeautifulSoupfrom urllib2 import urlopen

Any help in figuring out what the problem is and how it can be solved would much be appreciated.

任何帮助弄清楚问题是什么以及如何解决它都将不胜感激。

回答by James Errico

I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you're like me (on OSX) you might be stuck with something that requires a bit of work:

我怀疑这与 BS 用于读取 HTML 的解析器有关。他们的文档在这里,但如果你像我一样(在 OSX 上),你可能会遇到一些需要一些工作的事情:

You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.

您会注意到,在上面的 BS4 文档页面中,他们指出默认情况下 BS4 将使用 Python 内置的 HTML 解析器。假设您使用的是 OSX,Apple 捆绑的 Python 版本是 2.7.2,这对字符格式不宽松。我遇到了同样的问题,所以我升级了我的 Python 版本来解决它。在 virtualenv 中执行此操作将最大限度地减少对其他项目的干扰。

If doing that sounds like a pain, you can switch over to the LXML parser:

如果这样做听起来很痛苦,您可以切换到 LXML 解析器:

pip install lxml

And then try:

然后尝试:

soup = BeautifulSoup(html, "lxml")

Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packagesfairly easily.

根据您的情况,这可能已经足够了。我发现这很烦人,需要升级我的 Python 版本。使用 virtualenv,您可以相当轻松地迁移您的软件包

回答by Tim Seed

For basic out of the box python with bs4 installed then you can process your xml with

对于安装了 bs4 的基本开箱即用 python,您可以使用以下命令处理您的 xml

soup = BeautifulSoup(html, "html5lib")

If however you want to use formatter='xml'then you need to

但是,如果您想使用formatter='xml'那么您需要

pip3 install lxml

soup = BeautifulSoup(html, features="xml")

回答by Qiao Yang

I encountered the same issue. I found the reason is that I had a slightly-outdated python six package.

我遇到了同样的问题。我发现原因是我有一个稍微过时的python 6包。

>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
    from .html5parser import HTMLParser, parse, parseFragment
  File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
    from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys

Upgrading your six package will solve the issue:

升级你的六个包将解决这个问题:

sudo pip install six=1.10.0

回答by Ernst

I preferred built in python html parser, no install no dependencies

我更喜欢内置的python html解析器,没有安装没有依赖

soup = BeautifulSoup(s, "html.parser")

soup = BeautifulSoup(s, "html.parser")

回答by Bashar

I am using Python 3.6and I had the same original error in this post. After I ran the command:

我正在使用Python 3.6,并且在这篇文章中出现了相同的原始错误。在我运行命令后:

python3 -m pip install lxml

it resolved my problem

它解决了我的问题

回答by Yogesh

Instead of using lxml use html.parser, you can use this piece of code:

而不是使用 lxml 使用 html.parser,你可以使用这段代码:

soup = BeautifulSoup(html, 'html.parser')

回答by Projesh Bhoumik

Although BeautifulSoup supports the HTML parser by default If you want to use any other third-party Python parsers you need to install that external parser like(lxml).

虽然 BeautifulSoup 默认支持 HTML 解析器,如果你想使用任何其他第三方 Python 解析器,你需要安装外部解析器,如 (lxml)。

soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

But if you don't specified any parser as parameter you will get an warning that no parser specified.

但是如果你没有指定任何解析器作为参数,你会得到一个没有指定解析器的警告。

soup_object= BeautifulSoup(markup) #Warnning

To use any other external parser you need to install it and then need to specify it. like

要使用任何其他外部解析器,您需要安装它,然后需要指定它。喜欢

pip install lxml

soup_object= BeautifulSoup(markup,'lxml') # C dependent parser 

External parser have c and python dependency which may have some advantage and disadvantage.

外部解析器有 c 和 python 依赖,这可能有一些优点和缺点。

回答by abhishekPakrashi

In some references, use the second instead of the first:

在某些参考文献中,使用第二个而不是第一个:

soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')

回答by Pranav Bhendawade

The error is coming because of the parser you are using. In general, if you have HTML file/code then you need to use html5lib(documentation can be found here) & in-case you have XML file/data then you need to use lxml(documentation can be found here). You can use lxmlfor HTML file/code also but sometimes it gives an error as above. So, better to choose the package wisely based on the type of data/file. You can also use html_parserwhich is built-in module. But, this also sometimes do not work.

由于您正在使用的解析器,错误即将到来。一般来说,如果您有 HTML 文件/代码,那么您需要使用html5lib(文档可以在这里找到),如果您有 XML 文件/数据,那么您需要使用lxml(文档可以在这里找到)。您也可以lxml用于 HTML 文件/代码,但有时会出现上述错误。因此,最好根据数据/文件的类型明智地选择包。您还可以使用html_parserwhich 是内置模块。但是,这有时也不起作用。

For more details regarding when to use which package you can see the details here

有关何时使用哪个包的更多详细信息,您可以在此处查看详细信息

回答by Pikamander2

Run these three commands to make sure that you have all the relevant packages installed:

运行这三个命令以确保您安装了所有相关的软件包:

pip install bs4
pip install html5lib
pip install lxml

Then restart your Python IDE, if needed.

如果需要,然后重新启动 Python IDE。

That should take care of anything related to this issue.

这应该处理与此问题相关的任何事情。