java htmlunit:返回一个完全加载的页面
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16956952/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
htmlunit: return a completely loaded page
提问by justasd
I am using HtmlUnit library for Java to manipulate websites programmatically. I can't find the working solution to my problem: How to determine that all AJAX calls are finished and return a completely loaded webpage? Here's what I have tried:
我正在使用 Java 的 HtmlUnit 库以编程方式操作网站。我找不到我的问题的有效解决方案:如何确定所有 AJAX 调用都已完成并返回完全加载的网页?这是我尝试过的:
Firstly I create WebClient
instance and make call to my method processWebPage(String url, WebClient webClient)
首先我创建WebClient
实例并调用我的方法processWebPage(String url, WebClient webClient)
WebClient webClient = null;
try {
webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
webClient.setThrowExceptionOnScriptError(false);
webClient.setThrowExceptionOnFailingStatusCode(false);
webClient.setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
} catch (Exception e) {
System.out.println("Error");
}
HtmlPage currentPage = processWebPage("http://www.example.com", webClient);
And here is my method which should return a completely loaded web page:
这是我的方法,它应该返回一个完全加载的网页:
private static HtmlPage processWebPage(String url, WebClient webClient) {
HtmlPage page = null;
try {
page = webClient.getPage(url);
} catch (Exception e) {
System.out.println("Get page error");
}
int z = webClient.waitForBackgroundJavaScript(1000);
int counter = 1000;
while (z > 0) {
counter += 1000;
z = webClient.waitForBackgroundJavaScript(counter);
if (z == 0) {
break;
}
synchronized (page) {
System.out.println("wait");
try {
page.wait(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
System.out.println(page.asXml());
return page;
}
That z
variable should return 0
if there are no JavaScript left to load.
如果没有要加载的 JavaScript,该z
变量应该返回0
。
Any thoughts? Thanks in advance.
有什么想法吗?提前致谢。
EDIT:I found a partially working solution to my problem, but in this case I should know how the response page looks. For example, if a completely loaded page contains text "complete", my solution would be:
编辑:我找到了一个部分可行的解决方案,但在这种情况下,我应该知道响应页面的外观。例如,如果一个完全加载的页面包含文本“完成”,我的解决方案是:
HtmlPage page = null;
int PAGE_RETRY = 10;
try {
page = webClient.getPage("http://www.example.com");
} catch (Exception e) {
e.printStackTrace();
}
for (int i = 0; !page.asXml().contains("complete") && i < PAGE_RETRY; i++) {
try {
Thread.sleep(1000 * (i + 1));
page = webClient.getPage("http://www.example.com");
} catch (Exception e) {
e.printStackTrace();
}
}
But what would be the solution if I don't know how a completely loaded page looks like?
但是,如果我不知道完全加载的页面是什么样的,那么解决方案是什么?
采纳答案by brnfd
Try this:
试试这个:
HtmlPage page = null;
try {
page = webClient.getPage(url);
} catch (Exception e) {
System.out.println("Get page error");
}
JavaScriptJobManager manager = page.getEnclosingWindow().getJobManager();
while (manager.getJobCount() > 0) {
Thread.sleep(1000);
}
System.out.println(page.asXml());
return page;