如何在 Java 中解析 HTML 字符串？

Question

提问by IttayD

Given the string "<table><tr><td>Hello World!</td></tr></table>", what is the (easiest) way to get a DOMElement representing it?

给定 string "<table><tr><td>Hello World!</td></tr></table>"，获取代表它的DOM元素的（最简单的）方法是什么？

Answer 1

采纳答案by IttayD

I found this somewhere (don't remember where):

我在某处找到了这个（不记得在哪里）：

 public static DocumentFragment parseXml(Document doc, String fragment)
 {
    // Wrap the fragment in an arbitrary element.
    fragment = "<fragment>"+fragment+"</fragment>";
    try
    {
        // Create a DOM builder and parse the fragment.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document d = factory.newDocumentBuilder().parse(
                new InputSource(new StringReader(fragment)));

        // Import the nodes of the new document into doc so that they
        // will be compatible with doc.
        Node node = doc.importNode(d.getDocumentElement(), true);

        // Create the document fragment node to hold the new nodes.
        DocumentFragment docfrag = doc.createDocumentFragment();

        // Move the nodes into the fragment.
        while (node.hasChildNodes())
        {
            docfrag.appendChild(node.removeChild(node.getFirstChild()));
        }
        // Return the fragment.
        return docfrag;
    }
    catch (SAXException e)
    {
        // A parsing error occurred; the XML input is not valid.
    }
    catch (ParserConfigurationException e)
    {
    }
    catch (IOException e)
    {
    }
    return null;
}

Answer 2

回答by Andrew Hare

You could use Swing:

你可以使用 Swing：

How do you make use of the HTML-processing capabilities that are built into Java? You may not know that Swing contains all the classes necessary to parse HTML. Jeff Heaton shows you how.

您如何利用内置于 Java 中的 HTML 处理功能？您可能不知道 Swing 包含解析 HTML 所需的所有类。杰夫·希顿（Jeff Heaton）向您展示了方法。

Answer 3

回答by nkr1pt

you could use HTML Parser, which a Java library used to parse HTML in either a linear or nested fashion. It is an open source tool and can be found on SourceForge

您可以使用 HTML Parser，它是一个 Java 库，用于以线性或嵌套方式解析 HTML。它是一个开源工具，可以在 SourceForge 上找到

Answer 4

回答by non sequitor

I've used Jericho HTML Parserit's OSS, detects(forgives) badly formatted tags and is lightweight

我使用过Jericho HTML Parser它是 OSS，检测（原谅）格式错误的标签并且是轻量级的

Answer 5

回答by Bart Kiers

Here's a way:

这里有一个方法：

import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class HtmlParseDemo {
   public static void main(String [] args) throws Exception {
       Reader reader = new StringReader("<table><tr><td>Hello</td><td>World!</td></tr></table>");
       HTMLEditorKit.Parser parser = new ParserDelegator();
       parser.parse(reader, new HTMLTableParser(), true);
       reader.close();
   }
}

class HTMLTableParser extends HTMLEditorKit.ParserCallback {

    private boolean encounteredATableRow = false;

    public void handleText(char[] data, int pos) {
        if(encounteredATableRow) System.out.println(new String(data));
    }

    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        if(t == HTML.Tag.TR) encounteredATableRow = true;
    }

    public void handleEndTag(HTML.Tag t, int pos) {
        if(t == HTML.Tag.TR) encounteredATableRow = false;
    }
}

Answer 6

回答by zygimantus

If you have a string which contains HTML you can use Jsouplibrary like this to get HTML elements:

如果你有一个包含 HTML 的字符串，你可以像这样使用Jsoup库来获取 HTML 元素：

String htmlTable= "<table><tr><td>Hello World!</td></tr></table>";
Document doc = Jsoup.parse(htmlTable);

// then use something like this to get your element:
Elements tds = doc.getElementsByTag("td");

// tds will contain this one element: <td>Hello World!</td>

Good luck!

祝你好运！

如何在 Java 中解析 HTML 字符串？

提问by IttayD

采纳答案by IttayD

回答by Andrew Hare

回答by nkr1pt

回答by non sequitor

回答by Bart Kiers

回答by zygimantus

相关推荐

最近更新

标签

如何在 Java 中解析 HTML 字符串？

提问by IttayD

采纳答案by IttayD

回答by Andrew Hare

回答by nkr1pt

回答by non sequitor

回答by Bart Kiers

回答by zygimantus

相关推荐

Java 将哈希映射键与字符串进行比较

Java 使用非常大的字符串不好？(爪哇)

Java：当单词之间的空格数量可变时，用空格拆分字符串？

Java 休眠和 Scala

相关推荐

最近更新

标签