java 如何使用 Jsoup 从相对 HTML 链接中提取绝对 URL?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4144529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 05:00:04  来源:igfitidea点击:

How to extract absolute URL from relative HTML links using Jsoup?

javaurljsoup

提问by sundhar

I am using Jsoupto extract URL of an webpage. The hrefattribute of those URL's are relative like:

我正在使用Jsoup来提取网页的 URL。在href这些URL的属性是相对的,如:

<a href="/text">example</a>

Here is my attempt:

这是我的尝试:

Document document = Jsoup.connect(url).get();
Elements results = document.select("div.results");
Elements dls = results.select("dl");
for (Element dl : dls) {
    String url = dl.select("a").attr("href");
}

This works fine, but if I use

这工作正常,但如果我使用

String url = dl.select("a").attr("abs:href");

to get the absolute URL like http://example.com/text, it is not working. How can I get the absolute URL?

要获取绝对 URL 之类的http://example.com/text,它不起作用。如何获取绝对网址?

采纳答案by BalusC

You need Element#absUrl().

你需要Element#absUrl().

String url = dl.select("a").absUrl("href");


You can by the way shorten the select:

您可以顺便缩短选择:

Document document = Jsoup.connect(url).get();
Elements links = document.select("div.results dl a");
for (Element link : links) {
    String url = link.absUrl("href");
}

回答by tindase

String url = dl.select("a").absUrl("href");

String url = dl.select("a").absUrl("href");

Is not correct because dl.select("a")will not return a single item but a collection. You need to get elements by index

不正确,因为dl.select("a")不会返回单个项目而是一个集合。您需要通过索引获取元素

eg :

例如:

Elements elems = dl.select("a");
Element a1 = elems.get(0); //0 is the index first element increasing to (elems.size()-1)
now you can do
a1.absUrl("href");

If you are sure only one item will result from the select above, or that the item you want will be the first, you can:

如果您确定上面的选择只会产生一个项目,或者您想要的项目将是第一个,您可以:

String url = dl.select("a").get(0).absUrl("href"); 

Which is also same as

这也与

String url = dl.select("a").first().absUrl("href");

It doesn't have to be the first element anyway, you can always replace the 0 in String url = dl.select("a").get(0).absUrl("href");with the index of your element. Or use a select that is more specific that will only result in one element.

无论如何,它不必是第一个元素,您始终可以将 0 替换为 String url = dl.select("a").get(0).absUrl("href");元素的索引。或者使用更具体的选择,只会产生一个元素。