如何从 Java 验证 HTML?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4392505/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 06:09:29  来源:igfitidea点击:

How to validate HTML from Java?

javahtmlhtml-parsinghtml-validation

提问by Tony the Pony

What is a fast and simple way to validate HTML from Java? I'm looking for an open-source/PD class (or set of classes) that describes the various properties of the 100-odd HTML tags, such as:

从 Java 验证 HTML 的快速而简单的方法是什么?我正在寻找一个描述 100 多个 HTML 标签的各种属性的开源/PD 类(或一组类),例如:

  1. Is the tag optional? Empty? Is it legal to omit its closing tag?
  2. Which other tags can this tag contain (if any)?
  3. Which attributes are legal for this tag, and what are their types? (not required, but nice to have)
  1. 标签是可选的吗?空的?省略其结束标记是否合法?
  2. 此标签还可以包含哪些其他标签(如果有)?
  3. 这个标签的哪些属性是合法的,它们的类型是什么?(不是必需的,但很高兴拥有)

Thanks!

谢谢!

EDIT

编辑

I'm looking to do to a tag-by-tag analysis of an HTML document, so I'm less interested in whether the document as a whole is valid, but rather what the specific requirements are for each type of tag. I could encode the rules based on the W3C spec, but wanted to see which ready-made solutions are available first.

我希望对 HTML 文档进行逐个标签的分析,所以我对整个文档是否有效不太感兴趣,而是对每种类型的标签的具体要求是什么感兴趣。我可以根据 W3C 规范对规则进行编码,但想先了解哪些现成的解决方案可用。

采纳答案by Edwin Buck

If you want to verify certain tags follow certain specifications, there seems to be no end of Java based HTML parsers:

如果您想验证某些标签是否遵循某些规范,那么基于 Java 的 HTML 解析器似乎没有尽头:

Open Source HTML Parsers in Java

Java 中的开源 HTML 解析器

In other words, you could parse you HTML, and then inspect the resulting document for the tags you were looking for and determine if they meet the specifications you require. If they don't you could then just throw an error.

换句话说,您可以解析您的 HTML,然后检查您正在寻找的标签的结果文档,并确定它们是否符合您需要的规范。如果他们不这样做,你就可以抛出一个错误。

I don't think you'll find a HTML analysis tool which was written with exactly your requirements in mind, mostly because those requirements haven't been voiced and are probably a bit nebulous.

我不认为您会找到一个完全按照您的要求编写的 HTML 分析工具,主要是因为这些要求没有被表达出来并且可能有点模糊。

If the parser doesn't do what you want out of the box, at least this list is open source, so you can hack the parser as long as you publish your changes.

如果解析器没有立即执行您想要的操作,至少这个列表是开源的,因此只要您发布更改,您就可以破解解析器。

回答by Favonius

Check JTidy (http://jtidy.sourceforge.net/) and VietSpider HTMLParser ( http://sourceforge.net/projects/binhgiang/) both are Java HTML parser and some syntax checking capabilities. Some eclipse based HTML editor plugin use JTidy (or port of Tidy) for syntax checking. Or as David Said, submit the page to w3c.org

检查 JTidy ( http://jtidy.sourceforge.net/) 和 VietSpider HTMLParser ( http://sourceforge.net/projects/binhgiang/) 都是 Java HTML 解析器和一些语法检查功能。一些基于 Eclipse 的 HTML 编辑器插件使用 JTidy(或 Tidy 的端口)进行语法检查。或者像David Said 一样,将页面提交到 w3c.org