如何在java中以编程方式访问网页
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3549890/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to programmatically access web page in java
提问by duduamar
There is a web page from which I want to retrieve a certain string. In order to do so, I need to login, click some buttons, fill a text box, click another button - and then the string appears.
有一个网页,我想从中检索某个字符串。为此,我需要登录,单击一些按钮,填充文本框,单击另一个按钮 - 然后出现字符串。
How can I write a java program to do that automatically? Are there any useful libraries for that purpose?
我如何编写一个java程序来自动执行此操作?是否有任何有用的库可用于此目的?
Thanks
谢谢
采纳答案by YoK
Try HtmlUnit
试试HtmlUnit
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
HtmlUnit 是一个“用于 Java 程序的无 GUI 浏览器”。它为 HTML 文档建模并提供一个 API,允许您调用页面、填写表单、单击链接等……就像您在“普通”浏览器中所做的一样。
Example code for submiting form:
提交表单的示例代码:
@Test
public void submittingForm() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("root");
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
For more details check: http://htmlunit.sourceforge.net/gettingStarted.html
有关更多详细信息,请查看:http: //htmlunit.sourceforge.net/gettingStarted.html
回答by dierre
Well when you press a button usuallyyou do a request via a HTTP POST method, so you should use HttpClientto handle request and HtmlParserto handle the response page with the string you need.
好吧,当您按下按钮时,通常您会通过 HTTP POST 方法发出请求,因此您应该使用HttpClient来处理请求,并使用HtmlParser来处理带有您需要的字符串的响应页面。
回答by Bozho
Yes:
是的:
java.net.URL#openConnection()
will allow you to make http requests and get the http responsesApache HttpComponentsis a library that makes it easier to work with HTTP.
java.net.URL#openConnection()
将允许您发出 http 请求并获得 http 响应Apache HttpComponents是一个可以更轻松地使用 HTTP 的库。
回答by Mike C
Take a look at the apache HttpClientproject, or if you need to run Javascript on the page, try HttpUnit.
看看 apache HttpClient项目,或者如果您需要在页面上运行 Javascript,请尝试HttpUnit。
回答by Jon
The super simple way to do this is using HtmlUnit here:
执行此操作的超级简单方法是在此处使用 HtmlUnit:
http://htmlunit.sourceforge.net/
http://htmlunit.sourceforge.net/
and what you want to do can be as simple as:
你想要做的可以很简单:
@Test
public void homePage() throws Exception {
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
}