在 Android 中解析 HTML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2188049/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 04:53:00  来源:igfitidea点击:

Parse HTML in Android

androidhtmlparsing

提问by Daniel Benedykt

I am trying to parse HTML in android from a webpage, and since the webpage it not well formed, I get SAXException.

我正在尝试从网页解析 android 中的 HTML,由于网页格式不正确,我得到SAXException.

Is there a way to parse HTML in Android?

有没有办法在Android中解析HTML?

回答by ibaralf

I just encountered this problem. I tried a few things, but settled on using JSoup. The jar is about 132k, which is a bit big, but if you download the source and take out some of the methods you will not be using, then it is not as big.
=> Good thing about it is that it will handle badly formed HTML

我刚遇到这个问题。我尝试了一些东西,但最终决定使用JSoup。jar大概132k,有点大,但是如果你下载源码,把一些你不会用到的方法拿出来,那就没那么大了。
=> 好处是它可以处理格式错误的 HTML

Here's a good example from their site.

这是他们网站上的一个很好的例子。

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

//http://jsoup.org/cookbook/input/load-document-from-url
//Document doc = Jsoup.connect("http://example.com/").get();

Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
  String linkHref = link.attr("href");
  String linkText = link.text();
}

回答by Matthias

Have you tried using Html.fromHtml(source)?

您是否尝试过使用Html.fromHtml(source)

I think that class is pretty liberal with respect to source quality (it uses TagSoupinternally, which was designed with real-life, bad HTML in mind). It doesn't support all HTML tags though, but it does come with a handler you can implement to react on tags it doesn't understand.

我认为该类在源代码质量方面非常自由(它在内部使用TagSoup,它的设计考虑了现实生活中的糟糕 HTML)。虽然它不支持所有 HTML 标签,但它确实带有一个处理程序,您可以实现对它不理解的标签做出反应。

回答by EddieB

String tmpHtml = "<html>a whole bunch of html stuff</html>";
String htmlTextStr = Html.fromHtml(tmpHtml).toString();

回答by Nitin Khanna

We all know that programming have endless possibilities.There are numbers of solutions available for a single problem so i think all of the above solutions are perfect and may be helpful for someone but for me this one save my day..

我们都知道编程有无限的可能性。对于一个问题有很多解决方案,所以我认为上述所有解决方案都是完美的,可能对某人有帮助,但对我来说,这个可以节省我的一天..

So Code goes like this

所以代码是这样的

  private void getWebsite() {
    new Thread(new Runnable() {
      @Override
      public void run() {
        final StringBuilder builder = new StringBuilder();

        try {
          Document doc = Jsoup.connect("http://www.ssaurel.com/blog").get();
          String title = doc.title();
          Elements links = doc.select("a[href]");

          builder.append(title).append("\n");

          for (Element link : links) {
            builder.append("\n").append("Link : ").append(link.attr("href"))
            .append("\n").append("Text : ").append(link.text());
          }
        } catch (IOException e) {
          builder.append("Error : ").append(e.getMessage()).append("\n");
        }

        runOnUiThread(new Runnable() {
          @Override
          public void run() {
            result.setText(builder.toString());
          }
        });
      }
    }).start();
  }

You just have to call the above function in onCreate Methodof your MainActivity

你只需要在onCreate Method你的MainActivity

I hope this one is also helpful for you guys.

我希望这对你们也有帮助。

Also read the original blog at Medium

另请阅读Medium 上的原始博客

回答by oropher

Maybe you can use WebView, but as you can see in the doc WebView doesn't support javascript and other stuff like widgets by default.

也许您可以使用 WebView,但正如您在文档中所见,默认情况下 WebView 不支持 javascript 和其他诸如小部件之类的东西。

http://developer.android.com/reference/android/webkit/WebView.html

http://developer.android.com/reference/android/webkit/WebView.html

I think that you can enable javascript if you need it.

我认为如果需要,您可以启用 javascript。