scala 在 Javascript 更新后获取更改的 HTML 内容?(htmlunit)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17843521/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 05:32:24  来源:igfitidea点击:

Get the changed HTML content after it's updated by Javascript? (htmlunit)

javascalahtmlunit

提问by Zack Yoshyaro

I'm having some trouble figuring out how to get the content of some HTML afterjavascript has updated it.

我在弄清楚如何javascript 更新获取某些 HTML 的内容时遇到了一些麻烦。

Specifically, I'm trying to get the current time from US Naval Observatory Master Clock. It has an h1element with the IDof USNOclkin which it displays the current time.

具体来说,我试图从US Naval Observatory Master Clock获取当前时间。它有一个h1与元件IDUSNOclk,其中它显示当前时间。

When the page first loads, this element is set to display "Loading...", and then javascript kicks in and updates it to the current time via

当页面首次加载时,此元素设置为显示“正在加载...”,然后 javascript 启动并通过以下方式将其更新为当前时间

function showTime()
    {
        document.getElementById('USNOclk').innerHTML="Loading...<br />";
        xmlHttp=GetXmlHttpObject();
        if (xmlHttp==null){
            document.getElementById('USNOclk').innerHTML="Sorry, browser incapatible. <BR />";
            return;
        } 
        refresher = 0;
        startResponse = new Date().getTime();
        var url="http://tycho.usno.navy.mil/cgi-bin/time.pl?n="+ startResponse;
        xmlHttp.onreadystatechange=stateChanged;
        xmlHttp.open("GET",url,true);
        xmlHttp.send(null);
    }  

So, the problem is that I'm not sure how to get the updated time. When I check the element, I see the "Loading..." as the content of the h1element.

所以,问题是我不确定如何获得更新的时间。当我检查元素时,我看到“正在加载...”作为h1元素的内容。

I've double checked that javascript is enabled, and I've tried calling the waitForBackgroundJavaScriptfunction on the webclientas well hoping that it would give the javascript time to start updating stuff. However, no success as of yet.

我已经仔细检查了 javascript 是否已启用,并且我也尝试调用该waitForBackgroundJavaScript函数,webclient希望它能让 javascript 有时间开始更新内容。然而,目前还没有成功。

My Current Code:

我当前的代码:

import com.gargoylesoftware.htmlunit._
import com.gargoylesoftware.htmlunit.html.HtmlPage

object AtomicTime {

  def main(args: Array[String]): Unit = {
    val url = "http://tycho.usno.navy.mil/what.html"
    val client = new WebClient(BrowserVersion.CHROME)

    println(client.isJavaScriptEnabled()) // returns true
    client.waitForBackgroundJavaScript(10000)
//    client.waitForBackgroundJavaScriptStartingBefore(10000) //tried this one too without success
    var response: HtmlPage = client.getPage(url)
    println(response.asText())
  }
}

How do I trigger the javascript to update the HTML?

如何触发 javascript 更新 HTML?

采纳答案by Zack Yoshyaro

I figured it out!

我想到了!

HtmlPageobjects have an executeJavaScript(String)which can be used to kick off the showTimescript. Then, once the script has actually started, that's when waitForBackgroundJavaScriptbecomes relevant.

HtmlPage对象具有executeJavaScript(String)可用于启动showTime脚本的 。然后,一旦脚本真正启动,那waitForBackgroundJavaScript就是相关的时候。

The code I ended up with:

我最终得到的代码:

import com.gargoylesoftware.htmlunit._
import com.gargoylesoftware.htmlunit.html.HtmlPage
import com.gargoylesoftware.htmlunit.html.DomElement

object AtomicTime {

  def main(args: Array[String]): Unit = {
    val url = "http://tycho.usno.navy.mil/what.html"
    val client = new WebClient(BrowserVersion.CHROME)

    var response: HtmlPage = client.getPage(url)
    response.executeJavaScript("showTime")

    printf("Current AtomicTime: %s", getUpdatedRespose(response, client))
  }

  def getUpdatedRespose(page: HtmlPage, client: WebClient): String = {
    while (page.getElementById("USNOclk").asText() == "Loading...") {
      client.waitForBackgroundJavaScript(200)
    }
    return page.getElementById("USNOclk").asText()
  }
}

回答by Mosty Mostacho

Although the waitForBackgroundJavaScriptmethod seems to be a good alternative it's worth mentioning that it is experimental. You can see that in the JavaDocs that state:

尽管该waitForBackgroundJavaScript方法似乎是一个不错的选择,但值得一提的是它是实验性的。您可以在 JavaDocs 中看到:

Experimental API: May be changed in next release and may not yet work perfectly!

实验性 API:可能会在下一个版本中更改,并且可能还不能完美运行!

So I recommend to go for a slightly more complex approach:

所以我建议采用稍微复杂的方法:

int amountOfTries = 10;
while (amountOfTries > 0 && CONDITION) {
    amountOfTries--;
    synchronized (page) {
        page.wait(1000);
    }
}

Note the amountOfTriescondition is there to take appropriate action if there has been some kind of issue with the request. Otherwise, you will end up getting your self into an infinite loop. Be careful with that.

请注意,amountOfTries如果请求存在某种问题,则需要采取适当的措施。否则,你最终会让自己陷入无限循环。小心点。

Then you should replace CONDITIONwith your actual condition. In this case it is

那么你应该CONDITION用你的实际情况替换。在这种情况下是

page.getElementById("USNOclk").asText().equals("Loading...")

In short, what the code above does is checking for the condition to become trueeach second for a maximum of 10seconds.

简而言之,上面的代码所做的是检查条件是否变为true每秒最多10几秒钟。

Of course, a better approach would be to extract this error checking behavior into a separate method so that you can reuse the logic on different conditions.

当然,更好的方法是将这种错误检查行为提取到一个单独的方法中,以便您可以在不同条件下重用逻辑。