如何在 Java 中更改 HTML 标签内容?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1934248/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 18:35:20  来源:igfitidea点击:

How to change HTML tag content in Java?

javahtmltagsjtidy

提问by bugisoft

How can I change HTML content of tag in Java? For example:

如何在 Java 中更改标签的 HTML 内容?例如:

before:

前:

<html>
    <head>
    </head>
    <body>
        <div>text<div>**text**</div>text</div>
    </body>
</html>

after:

后:

<html>
    <head>
    </head>
    <body>
        <div>text<div>**new text**</div>text</div>
    </body>
</html>

I tried JTidy, but it doesn't support getTextContent. Is there any other solution?

我试过 JTidy,但它不支持getTextContent. 还有其他解决方案吗?



Thanks, I want parse no well-formed HTML. I tried TagSoup, but when I have this code:

谢谢,我不想解析格式正确的 HTML。我试过 TagSoup,但是当我有这个代码时:

<body>
sometext <div>text</div>
</body>

and I want change "sometext" to "someAnotherText," and when I use {bodyNode}.getTextContent()it gives me: "sometext text"; when I use setTextContet("someAnotherText"+{bodyNode}.getTextContent()), and serialize these structure, the result is <body>someAnotherText sometext text</body>, without <div>tags. This is a problem for me.

我想将“sometext”更改为“someAnotherText”,当我使用{bodyNode}.getTextContent()它时给我:“sometext text”;当我使用setTextContet("someAnotherText"+{bodyNode}.getTextContent()), 并序列化这些结构时,结果是<body>someAnotherText sometext text</body>,没有<div>标签。这对我来说是个问题。

回答by Pascal Thivent

Unless you are absolutely sure that the HTML will be valid and well formed, I'd strongly recommend to use an HTML parser, something like TagSoup, Jericho, NekoHTML, HTML Parser, etc, the two first being especially powerful to parse any kind of crap :)

除非您绝对确定 HTML 将有效且格式良好,否则我强烈建议使用 HTML 解析器,例如TagSoupJerichoNekoHTMLHTML Parser等,这两个首先特别强大,可以解析任何类型的废话:)

For example, with HTML Parser(because the implementation is very easy), using a visitor, provide your own NodeVisitor:

例如,使用HTML Parser(因为实现非常简单),使用访问者,提供您自己的NodeVisitor

public class MyNodeVisitor extends NodeVisitor {
    public MyNodeVisitor() {
    }

    public void visitStringNode (Text string)
    {
        if (string.getText().equals("**text**")) {
            string.setText("**new text**");
        }
    }
}

Then, create a Parser, parse the HTML string and visit the returned node list:

然后,创建一个Parser,解析 HTML 字符串并访问返回的节点列表:

Parser parser = new Parser(htmlString);
NodeList nl = parser.parse(null);
nl.visitAllNodesWith(new MyNodeVisitor());
System.out.println(nl.toHtml());

This is just one way to implement this, pretty straight forward.

这只是实现这一点的一种方法,非常简单。

回答by Dmitry

Provided that your HTML is a well-formed XML (if it is not then you may use JTidy to tidify it), you can parse it using DOM or SAX parser. DOM is probably easier if your document is not huge.

如果您的 HTML 是格式良好的 XML(如果不是,那么您可以使用 JTidy 对其进行整理),您可以使用 DOM 或 SAX 解析器对其进行解析。如果您的文档不是很大,DOM 可能更容易。

Something like this will do the trick if your text is the only child of a node with id="id":

如果您的文本是具有 id="id" 的节点的唯一子节点,则这样的操作将起作用:

Document d = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
Element e = d.getElementById("id");
Node text = e.getFirstChild();
text.setNodeValue(process(text.getNodeValue());

You may save d afterwards to a file.

之后您可以将 d 保存到文件中。

回答by Chad Okere

There are a bunch of Open source Java HTML parsers listed here.

现在有很多上市的开源Java的HTML解析器这里

I'm not sure what's most commonly used, but this one(just called HTML parser) will probably do what you want. It has functions to modify your tree and write it back out.

我不确定什么是最常用的,但是这个(只是称为 HTML 解析器)可能会做你想做的。它具有修改树并将其写回的功能。

回答by Menai Ala Eddine

In general you have a HTML document that you want to extract data from. You know generally the structure of the HTML document.

通常,您有一个要从中提取数据的 HTML 文档。您通常了解 HTML 文档的结构。

There are several parser libraries but the best one is Jsoup,you can use the DOM methods to navigate your document and update values.In your case you need to read your file and use the attribute setter methods.

有几个解析器库,但最好的一个是Jsoup,您可以使用 DOM 方法来导航您的文档并更新值。在您的情况下,您需要读取您的文件并使用属性设置器方法。

Sample XHTML file :

示例 XHTML 文件:

<?xml version="1.0" encoding="UTF-8"?>
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
-->
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Example</title>
    </head>
    <body>
        <p id="content">Hello World</p>

    </body>
</html>

Java code :

Java代码:

     File input = new File("D:\Projects\Odata Project\Odata\src\web\html\inscription_template.xhtml");
            org.jsoup.nodes.Document doc = Jsoup.parse(input,null);
            org.jsoup.nodes.Element content = doc.getElementById("content");
            System.out.println(content.text("Hi How are you ?"));
            System.out.println(content.text());
            System.out.println(doc);

Output after execution:

执行后输出:

<p id="content">Hi How are you ?</p>
Hi How are you ?
<!--?xml version="1.0" encoding="UTF-8"?-->
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
--><!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml">
 <head> 
  <title>Example</title> 
 </head> 
 <body> 
  <p id="content">Hi How are you ?</p>   
 </body>
</html>