在 Java 中使用 XPath 查询 HTML 的库？

Question

提问by Leonardo Marques

Can anyone recommend me a java library to allow me XPath Queries over URLs? I've tried JAXP without success.

任何人都可以向我推荐一个允许我通过 URL 进行 XPath 查询的 Java 库吗？我试过 JAXP 没有成功。

Thank you.

谢谢你。

Answer 1

回答by Mark Butler

There are several different approaches to this documented on the Web:

Web 上记录了几种不同的方法：

Using HtmlCleaner

使用HtmlCleaner

HtmlCleaner / Java DOM parser - Using XPath Contains against HTML in Java(This is the way I recommend)
HtmlCleaner itself has a built in utility supporting XPath - See the javadocs http://htmlcleaner.sourceforge.net/doc/org/htmlcleaner/XPather.htmlor this example http://thinkandroid.wordpress.com/2010/01/05/using-xpath-and-html-cleaner-to-parse-html-xml/

HtmlCleaner / Java DOM parser -在 Java 中对 HTML 使用 XPath 包含（这是我推荐的方式）
HtmlCleaner 本身具有支持 XPath 的内置实用程序 - 请参阅 javadocs http://htmlcleaner.sourceforge.net/doc/org/htmlcleaner/XPather.html或此示例http://thinkandroid.wordpress.com/2010/01/05 /using-xpath-and-html-cleaner-to-parse-html-xml/

Using Jericho

使用耶利哥

Jericho and Jaxen http://sujitpal.blogspot.com/2009/04/xpath-over-html-using-jericho-and-jaxen.html

Jericho 和 Jaxen http://sujitpal.blogspot.com/2009/04/xpath-over-html-using-jericho-and-jaxen.html

I have tried a few different variations of these approaches, i.e. HtmlParser plus the Java DOM parser, and JSoup plus Jaxen, but the combination that worked best is HtmlCleaner plus the Java DOM parser. The next best combination was Jericho plus Jaxen.

我尝试了这些方法的几种不同变体，即 HtmlParser 加 Java DOM 解析器，以及 JSoup 加 Jaxen，但效果最好的组合是 HtmlCleaner 加 Java DOM 解析器。次佳组合是 Jericho 和 Jaxen。

Answer 2

回答by Artem Barger

jsoup, Java HTML ParserVery similar to jQuery syntax way.

jsoup，Java HTML Parser非常类似于jQuery 的语法方式。

Answer 3

回答by Martin Honnen

You could use TagSouptogether with Saxon. That way you simply replace any XML SAX parser used with TagSoup and the XPath 2.0 or XSLT 2.0 or XQuery 1.0 implementation works as usual.

您可以将TagSoup与 Saxon 一起使用。这样，您只需替换与 TagSoup 一起使用的任何 XML SAX 解析器，XPath 2.0 或 XSLT 2.0 或 XQuery 1.0 实现就可以照常工作。

Answer 4

回答by Tassos Bassoukos

I've used JTidy to make HTML into a proper DOM, then used plain XPath to query the DOM.

我已经使用 JTidy 将 HTML 变成了一个合适的 DOM，然后使用普通的 XPath 来查询 DOM。

If you want to do cross-document/cross-URL queries, better use JTidy with XQuery.

如果您想进行跨文档/跨 URL 查询，最好将 JTidy 与 XQuery 结合使用。

在 Java 中使用 XPath 查询 HTML 的库？

提问by Leonardo Marques

回答by Mark Butler

回答by Artem Barger

回答by Martin Honnen

回答by Tassos Bassoukos

相关推荐

最近更新

标签

在 Java 中使用 XPath 查询 HTML 的库？

提问by Leonardo Marques

回答by Mark Butler

回答by Artem Barger

回答by Martin Honnen

回答by Tassos Bassoukos

相关推荐

java 实现注解的用例

java 空字符串的模式是什么？

java 按对象属性对对象的 ArrayList 进行排序

从 java Runtime.exec 读取流

相关推荐

最近更新

标签