在用 Java 解析之前单击网页元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14318991/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 15:52:13  来源:igfitidea点击:

Perform click on web page element before parsing in Java

javahtmljsoup

提问by Veljko

I'm trying to parse html page with dom parser and jsoup library. The problem that I'm facing is this:

我正在尝试使用 dom 解析器和 jsoup 库解析 html 页面。我面临的问题是:

On web site there are two buttons which show two different tables. I need to parse table which is shown when second button is clicked. There is different attribute values set then.

在网站上有两个按钮显示两个不同的表格。我需要解析单击第二个按钮时显示的表。然后设置了不同的属性值。

When I do Jsoup.connect("example.com")

当我做 Jsoup.connect("example.com")

I get response like first button is selected, and I don't need that data.

我得到了第一个按钮被选中的响应,我不需要那个数据。

Is there a way to perform click on second button, and then start parsing and retrieving data from web site?

有没有办法点击第二个按钮,然后开始从网站解析和检索数据?

采纳答案by Will

JSoup can't control the web page, only parse the content. For manipulation and interaction, there are some tools. I recommend Geb, which uses a Groovy DSL with a JQuery like syntax, making it very fluent. It's also pretty easy to parse xml/html with it.

JSoup 不能控制网页,只能解析内容。对于操作和交互,有一些工具。我推荐Geb,它使用 Groovy DSL 和类似 JQuery 的语法,使其非常流畅。用它解析 xml/html 也很容易。

回答by sp00m

Jsoup is just a parser, i.e. it can't handle events such as clicking on buttons. Have a look at browser automation tools (e.g. Selenium) to perform this kind of job.

Jsoup 只是一个解析器,即它不能处理诸如单击按钮之类的事件。查看浏览器自动化工具(例如Selenium)来执行此类工作。

回答by mtk

JSoup is a HTML parser and not a browser alternative. Take a look at Html Unit

JSoup 是一个 HTML 解析器,而不是浏览器的替代品。看看Html 单元

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

HtmlUnit 是一个“用于 Java 程序的无 GUI 浏览器”。它为 HTML 文档建模并提供一个 API,允许您调用页面、填写表单、单击链接等……就像您在“普通”浏览器中所做的一样。