Java 如何在 Android 中执行 Web Scraping?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34469737/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 15:41:07  来源:igfitidea点击:

How do I perform Web Scraping in Android?

javaandroidweb-scrapinghtmlunit

提问by Sujal Mandal

I want to scrape my website and then use the data from the website to populate elements in my app, my website has login pages and certain pages only open after the login has been done.

我想抓取我的网站,然后使用网站中的数据填充我的应用程序中的元素,我的网站有登录页面,某些页面仅在登录完成后打开。

I started working with HtmlUnit as it is a headless browser and completed the custom api in a java IDE, later i tried to use the jar i generated from the java IDE and found that there are incompatibility issues with HtmlUnit and Android.

我开始使用HtmlUnit,因为它是一个无头浏览器,并在java IDE中完成了自定义api,后来我尝试使用我从java IDE生成的jar,发现HtmlUnit和Android存在不兼容问题。

Can anyone propose a solution to this problem?

任何人都可以提出解决这个问题的方法吗?

Edit :Since no one actually answered this question I am currently going with a work around using android's native WebView, settings its Visibility to invisible and then using javascript interfacing to a Java object, I can inject JS code to scrape any data.

编辑:由于没有人真正回答过这个问题,我目前正在尝试使用 android 的本机 WebView,将其可见性设置为不可见,然后使用 javascript 接口到 Java 对象,我可以注入 JS 代码来抓取任何数据。

回答by Fab

If a real headlessbrowser able to manage any recent web features, would exist, it would mean a team would have developed it and then invest much effort in it (in supporting existing and coming features) consistently.

如果一个真正的无头浏览器能够管理任何最近的 Web 功能,这将意味着一个团队会开发它,然后持续投入大量精力(支持现有和即将推出的功能)。

Apart from Opera, Chrome, IE, and Firefox browsers, there is no such team. I would point out Chromium (CEF) as the most open and actively supported cross language wise. Try Cef for java

除了 Opera、Chrome、IE 和 Firefox 浏览器,没有这样的团队。我会指出 Chromium (CEF) 是最开放和最积极支持的跨语言明智的。试试Cef for Java

回答by Zeeshan Shabbir

Use Jsouplibrary for such purpose. Very handy and easy to use. Start with this answerand follow documents and other examples.

为此目的使用Jsoup库。非常方便且易于使用。从这个答案开始,然后按照文档和其他示例进行操作。