java 如何使用 Jsoup 从相对 HTML 链接中提取绝对 URL?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4144529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract absolute URL from relative HTML links using Jsoup?
提问by sundhar
I am using Jsoupto extract URL of an webpage. The href
attribute of those URL's are relative like:
我正在使用Jsoup来提取网页的 URL。在href
这些URL的属性是相对的,如:
<a href="/text">example</a>
Here is my attempt:
这是我的尝试:
Document document = Jsoup.connect(url).get();
Elements results = document.select("div.results");
Elements dls = results.select("dl");
for (Element dl : dls) {
String url = dl.select("a").attr("href");
}
This works fine, but if I use
这工作正常,但如果我使用
String url = dl.select("a").attr("abs:href");
to get the absolute URL like http://example.com/text
, it is not working. How can I get the absolute URL?
要获取绝对 URL 之类的http://example.com/text
,它不起作用。如何获取绝对网址?
采纳答案by BalusC
You need Element#absUrl()
.
你需要Element#absUrl()
.
String url = dl.select("a").absUrl("href");
You can by the way shorten the select:
您可以顺便缩短选择:
Document document = Jsoup.connect(url).get();
Elements links = document.select("div.results dl a");
for (Element link : links) {
String url = link.absUrl("href");
}
回答by tindase
String url = dl.select("a").absUrl("href");
String url = dl.select("a").absUrl("href");
Is not correct because dl.select("a")
will not return a single item but a collection.
You need to get elements by index
不正确,因为dl.select("a")
不会返回单个项目而是一个集合。您需要通过索引获取元素
eg :
例如:
Elements elems = dl.select("a");
Element a1 = elems.get(0); //0 is the index first element increasing to (elems.size()-1)
now you can do
a1.absUrl("href");
If you are sure only one item will result from the select above, or that the item you want will be the first, you can:
如果您确定上面的选择只会产生一个项目,或者您想要的项目将是第一个,您可以:
String url = dl.select("a").get(0).absUrl("href");
Which is also same as
这也与
String url = dl.select("a").first().absUrl("href");
It doesn't have to be the first element anyway, you can always replace the 0 in
String url = dl.select("a").get(0).absUrl("href");
with the index of your element.
Or use a select that is more specific that will only result in one element.
无论如何,它不必是第一个元素,您始终可以将 0 替换为
String url = dl.select("a").get(0).absUrl("href");
元素的索引。或者使用更具体的选择,只会产生一个元素。