从 Java 调用网页上的 Javascript

Question

提问by Warlax

My goal is to connect to an OWA page (Microsoft Office Outlook Web Access - basically an email client) and log-in, then read the new page loaded and find the inbox count.

我的目标是连接到 OWA 页面（Microsoft Office Outlook Web Access - 基本上是一个电子邮件客户端）并登录，然后阅读加载的新页面并找到收件箱计数。

To login, I need to fill the username and the password fields and call a certain javascript function for which I know the name and header.

要登录，我需要填写用户名和密码字段并调用某个我知道名称和标题的 javascript 函数。

How do I:

我如何：

Get the DOM for the page?
Update the DOM to fill out the input text fields?
Call that Javascript function?
Get the new URL for the page I am redirected to?

获取页面的 DOM？
更新 DOM 以填写输入文本字段？
调用那个 Javascript 函数？
获取我重定向到的页面的新 URL？

So far I am able to connect to a webpage and load its page source using the following Java code:

到目前为止，我能够使用以下 Java 代码连接到网页并加载其页面源：

                // open the connection to the welcome page
                callback.status("Opening connection...");
                URLConnection connection = null;
                try
                {
                    connection = url.openConnection();
                }
                catch(IOException ex)
                {
                    throw new Exception("I/O Problem while attempting URL connection");
                }

                connection.setDoInput(true);

                // open input stream to read website
                callback.status("Opening data stream...");
                InputStream input = null;
                try
                {
                    input = connection.getInputStream();
                }
                catch(IOException ex)
                {
                    throw new Exception("I/O Problem while opening data stream");
                }

                // read website contents
                callback.status("Reading site...");

                String content = "";
                byte[] buffer = new byte[100];
                int totalBytesRead = 0;
                int bytesRead = 0;
                try
                {
                    while((bytesRead = input.read(buffer)) != -1)
                    {
                        String newContent = new String(buffer, 0, bytesRead);
                        content += newContent;
                    }
                }
                catch(IOException ex)
                {
                    throw new Exception("I/O Problem while reading website");
                }

                System.out.println(content);

The result is the entire page source being output to the console - great. I also attempted to parse the page to get a DOM object which I can then follow to find my username and password fields:

结果是整个页面源被输出到控制台 - 很好。我还尝试解析页面以获取 DOM 对象，然后我可以按照该对象查找我的用户名和密码字段：

                XMLParserConfiguration config = new XML11DTDConfiguration();
                DOMParser parser = new DOMParser(config);
                InputSource inputSource = new InputSource(input);
                inputSource.setByteStream(input);
                try
                {
                    parser.parse(inputSource);
                }
                catch(SAXParseException ex)
                {

                }
                Document document = parser.getDocument();
                visitNode(document, 0);

But I am getting a SAXParseException: :6:62: White spaces are required between publicId and systemId.

但我得到一个 SAXParseException: :6:62: publicId 和 systemId 之间需要空格。

Looks like this line is to blame:

看起来这行是罪魁祸首：

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

So I may need to change that DOMParser's configuration somehow to be lenient enough and "forgive" the white space requirement.

因此，我可能需要以某种方式更改 DOMParser 的配置，以便足够宽容并“原谅”空白要求。

Answer 1

回答by BalusC

So you want to act like a GUI-less webbrowser programmaticaly? Use HtmlUnit, that's exactly what it advertises itself with.

所以你想以编程方式像一个无 GUI 的网络浏览器？使用HtmlUnit，这正是它宣传的内容。

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.
It is typically used for testing purposes or to retrieve information from web sites.

HtmlUnit 是一个“用于 Java 程序的无 GUI 浏览器”。它为 HTML 文档建模并提供一个 API，允许您调用页面、填写表单、单击链接等……就像您在“普通”浏览器中所做的一样。
它具有相当好的 JavaScript 支持（不断改进），甚至可以使用非常复杂的 AJAX 库，根据您要使用的配置模拟 Firefox 或 Internet Explorer。
它通常用于测试目的或从网站检索信息。

也可以看看：

Pros and cons of HTML parsers in Java

Java 中 HTML 解析器的优缺点

从 Java 调用网页上的 Javascript

提问by Warlax

回答by BalusC

See also:

也可以看看：

相关推荐

最近更新

标签

从 Java 调用网页上的 Javascript

提问by Warlax

回答by BalusC

See also:

也可以看看：

相关推荐

javascript jqgrid 是否支持“开箱即用”导出到 excel 或者我需要编写一些服务器端代码吗？

Google Maps JavaScript API - 自动缩放级别？

如何加密 HTML、CSS 和 JavaScript 以防止侵犯版权

javascript 鼠标光标在Javascript中的位置？

相关推荐

最近更新

标签