java 如何使用 HtmlUnit 加载 ajax?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6796805/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to load ajax with HtmlUnit?
提问by Иван Бишевац
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
public class YoutubeBot {
private static final String YOUTUBE = "http://www.youtube.com";
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient();
webClient.setThrowExceptionOnScriptError(false);
// This is equivalent to typing youtube.com to the adress bar of browser
HtmlPage currentPage = webClient.getPage("http://www.youtube.com/results?search_type=videos&search_query=official+music+video&search_sort=video_date_uploaded&suggested_categories=10%2C24&uni=3");
// Get form where submit button is located
HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");
// Get the input field.
HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
// Insert the search term.
searchInput.setText("java");
// Workaround: create a 'fake' button and add it to the form.
HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
submitButton.setAttribute("type", "submit");
searchForm.appendChild(submitButton);
//Workaround: use the reference to the button to submit the form.
HtmlPage newPage = submitButton.click();
//Find all links on page with given class
final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) currentPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']");
//Print all links to console
for (int i=0; i<listLinks.size(); i++)
System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href"));
}
}
This code is working but I just want to sort youtube clips for example by upload date. How to do this with HtmlUnit? I have to click on filter, this should load content by ajax request and then I should click on "Upload date" link. I just don't know this first step, to load ajax content. Is this possible with HtmlUnit?
此代码正在运行,但我只想按上传日期对 youtube 剪辑进行排序。如何用 HtmlUnit 做到这一点?我必须点击过滤器,这应该通过ajax请求加载内容,然后我应该点击“上传日期”链接。我只是不知道这是加载 ajax 内容的第一步。这可以用 HtmlUnit 实现吗?
采纳答案by Jasper
Here's one way to do it:
这是一种方法:
- Search the page as you did in your previous question.
- Select
search-lego-refinements
block by id. - Use XPath to navigate to the URL (
//ul/li/a
when you start from the previous id). - Click the selected link.
- 像您在上一个问题中所做的那样搜索页面。
search-lego-refinements
按 id选择块。- 使用 XPath 导航到 URL(
//ul/li/a
当您从前一个 id 开始时)。 - 单击所选链接。
The following code sample shows how it could be done:
以下代码示例显示了如何完成:
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
public class YoutubeBot {
private static final String YOUTUBE = "http://www.youtube.com";
@SuppressWarnings("unchecked")
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient();
webClient.setThrowExceptionOnScriptError(false);
// This is equivalent to typing youtube.com to the adress bar of browser
HtmlPage currentPage = webClient.getPage(YOUTUBE);
// Get form where submit button is located
HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");
// Get the input field
HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
// Insert the search term
searchInput.setText("java");
// Workaround: create a 'fake' button and add it to the form
HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
submitButton.setAttribute("type", "submit");
searchForm.appendChild(submitButton);
// Workaround: use the reference to the button to submit the form.
currentPage = submitButton.click();
// Get the div containing the filters
HtmlElement filterDiv = currentPage.getElementById("search-lego-refinements");
// Select the first link from the filter block (Upload date)
HtmlAnchor sortByDateLink = ((List<HtmlAnchor>) filterDiv.getByXPath("//ul/li/a")).get(0);
// Click the 'Upload date' link
currentPage = sortByDateLink.click();
System.out.println(currentPage.asText());
}
}
You could just browse the correct query URL as well (http://www.youtube.com/results?search_type=videos&search_query=nyan+cat&search_sort=video_date_uploaded
).
您也可以浏览正确的查询 URL ( http://www.youtube.com/results?search_type=videos&search_query=nyan+cat&search_sort=video_date_uploaded
)。
But then you would have to encode your search parameter(s) (replace spaces with +
for example).
但是,您必须对搜索参数进行编码(+
例如替换空格)。
回答by Varun Tulsian
This worked for me. Set this
这对我有用。设置这个
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
This would cause all ajax calls to be synchronous.
这将导致所有 ajax 调用都是同步的。
This is how I setup my WebClient object
这就是我设置 WebClient 对象的方式
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getCookieManager().setCookiesEnabled(true);
回答by gis_wild
回答by NagyI
I've played with HTMLUnit earlier for similar purposes.
为了类似的目的,我之前曾使用过 HTMLUnit。
Actually you can find all information you need here. HTMLUnit has AJAX support enabled by default so when you get the newPage
object in your code you can issue click events on the page (finding the specific element and call it's click()
function). The trickiest part is that AJAX is asynchronous so you have to call wait()
or sleep()
after performing virtual click so Javascript code on the site could process the actions. This is not the best approach since network usage makes sleep()
unreliable. You may find some thing on the page which changes when you execute an event making AJAX calls (eg. a header title changes) so you can check regularly if this change has already happened to the site or not. (I should mention that there's an event resynchronizerbuilt in to HTMLUnit, however i couldn't manage to make it work as i expected it to be.) I use Firebug or Chrome's developer toolbar for examining the site. You could check out the DOM tree before and after AJAX calls and this way you'll know how to reference specific controls (like links and dropdown menus) on the page.
其实你可以在这里找到你需要的所有信息。HTMLUnit 默认启用 AJAX 支持,因此当您newPage
在代码中获取对象时,您可以在页面上发出单击事件(查找特定元素并调用它的click()
函数)。最棘手的部分是 AJAX 是异步的,因此您必须在执行虚拟点击之后调用wait()
或sleep()
以便网站上的 Javascript 代码可以处理操作。这不是最好的方法,因为网络使用变得sleep()
不可靠。您可能会发现页面上的某些内容在您执行进行 AJAX 调用的事件时会发生变化(例如标题标题更改),因此您可以定期检查此更改是否已经发生在站点上。(我应该提到有一个事件重新同步器内置于 HTMLUnit,但我无法使其按预期工作。)我使用 Firebug 或 Chrome 的开发人员工具栏来检查站点。您可以在 AJAX 调用之前和之后查看 DOM 树,这样您就会知道如何引用页面上的特定控件(如链接和下拉菜单)。
I would use XPath to get specific elements then, eg. you can do this (from HTML Unit's examples):
然后我会使用 XPath 来获取特定元素,例如。你可以这样做(来自 HTML Unit 的例子):
//get div which has a 'name' attribute of 'John'
final HtmlDivision div = (HtmlDivision) page.getByXPath("//div[@name='John']").get(0);
YouTube actually not uses AJAX for resorting it's result. When you click the sort dropdown on the result page (this is a decorated <button>
) an absolute positioned <ul>
shows up (this emulates the drop-down part of the combo) which has <li>
elements for each menu item. <li>
elements contain a special <span>
element with a href
attribute attached. When you click the <span>
element Javascript navigates the browser to this href
value.
YouTube 实际上不使用 AJAX 来使用它的结果。当您单击结果页面上的排序下拉列表(这是一个装饰<button>
)时,会<ul>
显示一个绝对定位(这模拟组合的下拉部分),其中包含<li>
每个菜单项的元素。<li>
元素包含一个<span>
带有href
附加属性的特殊元素。当您单击<span>
元素时,Javascript 会将浏览器导航到此href
值。
For eg. in my case the sort by relevance <span>
element looks like this:
例如。在我的情况下,按相关性<span>
元素排序如下所示:
<span href="/results?search_type=videos&search_query=test&suggested_categories=2%2C24%2C10%2C1%2C28" class=" yt-uix-button-menu-item" onclick=";window.location.href=this.getAttribute('href');return false;">Relevancia</span>
You can get the list of these spans relatively easily since the hosting <ul>
is the only such child of <body>
. Although you have to click on the dropdown button first because it'll create the <ul>
element with all childs described above using Javascript. You can get the sort by button with this XPath:
您可以相对轻松地获得这些跨度的列表,因为托管<ul>
是<body>
. 尽管您必须先单击下拉按钮,因为它将<ul>
使用 Javascript创建具有上述所有子元素的元素。您可以使用此 XPath 获得按按钮排序:
//div[@class='sort-by floatR']/button
You can test your XPath queries eg. right in Chrome if you open the developer tools and the Javascript developer console from it's toolbar. Then you can test like this:
您可以测试您的 XPath 查询,例如。如果您从 Chrome 的工具栏中打开开发人员工具和 Javascript 开发人员控制台,则直接在 Chrome 中。然后你可以这样测试:
> $x("//div[@class='sort-by floatR']/button")
[
<button type=?"button" class=?" yt-uix-button yt-uix-button-text yt-uix-button-active" onclick=?";?return false;?" role=?"button" aria-pressed=?"true" aria-expanded=?"true" aria-haspopup=?"true" aria-activedescendant data-button-listener=?"26">?…?</button>?
]
Hope this'll get you to the right direction.
希望这会让你走向正确的方向。