java 如何使用 Jsoup 搜索评论（“”）？

Question

提问by 87element

I would like to remove those tags with their content from source HTML.

我想从源 HTML 中删除那些带有内容的标签。

Answer 1

回答by dlamblin

When searching you basically use Elements.select(selector)where selectoris defined by this API. However comments are not elements technically, so you may be confused here, still they are nodes identified by the node name #comment.

搜索时，您基本上使用Elements.select(selector)where selectoris defined by this API。然而，注释在技术上不是元素，所以你可能会在这里感到困惑，它们仍然是由节点名称标识的节点#comment。

Let's see how that might work:

让我们看看它是如何工作的：

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;

public class RemoveComments {
    public static void main(String... args) {
        String h = "<html><head></head><body>" +
          "<div><!-- foo --><p>bar<!-- baz --></div><!--qux--></body></html>";
        Document doc = Jsoup.parse(h);
        removeComments(doc);
        doc.html(System.out);
    }

    private static void removeComments(Node node) {
        for (int i = 0; i < node.childNodeSize();) {
            Node child = node.childNode(i);
            if (child.nodeName().equals("#comment"))
                child.remove();
            else {
                removeComments(child);
                i++;
            }
        }
    }        
}

Answer 2

回答by Michael Conrad

With JSoup 1.11+ (possibly older version) you can apply a filter:

使用 JSoup 1.11+（可能是旧版本），您可以应用过滤器：

private void removeComments(Element article) {
    article.filter(new NodeFilter() {
        @Override
        public FilterResult tail(Node node, int depth) {
            if (node instanceof Comment) {
                return FilterResult.REMOVE;
            }
            return FilterResult.CONTINUE;
        }

        @Override
        public FilterResult head(Node node, int depth) {
            if (node instanceof Comment) {
                return FilterResult.REMOVE;
            }
            return FilterResult.CONTINUE;
        }
    });
}

Answer 3

回答by byte mamba

reference @dlamblin https://stackoverflow.com/a/7541875/4712855this code get comment html

参考@dlamblin https://stackoverflow.com/a/7541875/4712855此代码获取评论html

public static void getHtmlComments(Node node) {
    for (int i = 0; i < node.childNodeSize();i++) {
        Node child = node.childNode(i);
        if (child.nodeName().equals("#comment")) {
            Comment comment = (Comment) child;
            child.after(comment.getData());
            child.remove();
        }
        else {
            getHtmlComments(child);
        }
    }
}

Answer 4

回答by Feuerrabe

This is a variation of the first example using a functional programming approach. The easiest way to find all comments, which are immediate children of the current node is to use .filter()on a stream of .childNodes()

这是使用函数式编程方法的第一个示例的变体。查找所有评论（当前节点的直接子节点）的最简单方法是.filter()在.childNodes()

public void removeComments(Element e) {
    e.childNodes().stream()
        .filter(n -> n.nodeName().equals("#comment")).collect(Collectors.toList())
        .forEach(n -> n.remove());
    e.children().forEach(elem -> removeComments(elem));
}

Full example:

完整示例：

package demo;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.stream.Collectors;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class Demo {

public static void removeComments(Element e) {
    e.childNodes().stream()
        .filter(n -> n.nodeName().equals("#comment")).collect(Collectors.toList())
        .forEach(n -> n.remove());
    e.children().forEach(elem -> removeComments(elem));
}

public static void main(String[] args) throws MalformedURLException, IOException {
    Document doc = Jsoup.parse(new URL("https://en.wikipedia.org/"), 500);

    // do not try this with JDK < 8
    String userHome = System.getProperty("user.home");
    PrintStream out = new PrintStream(new FileOutputStream(userHome + File.separator + "before.html"));
    out.print(doc.outerHtml());
    out.close();

    removeComments(doc);
    out = new PrintStream(new FileOutputStream(userHome + File.separator + "after.html"));
    out.print(doc.outerHtml());
    out.close();
}

}

java 如何使用 Jsoup 搜索评论（“”）？

提问by 87element

回答by dlamblin

回答by Michael Conrad

回答by byte mamba

回答by Feuerrabe

相关推荐

最近更新

标签

java 如何使用 Jsoup 搜索评论（“”）？

提问by 87element

回答by dlamblin

回答by Michael Conrad

回答by byte mamba

回答by Feuerrabe

相关推荐

java 如何将参数传递给 Timertask Run 方法

java jndi ldap连接超时

java 如何组织类、包

java Android 本地服务器套接字

相关推荐

最近更新

标签