Java 如何在 Android 中执行 Web Scraping?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34469737/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I perform Web Scraping in Android?
提问by Sujal Mandal
I want to scrape my website and then use the data from the website to populate elements in my app, my website has login pages and certain pages only open after the login has been done.
我想抓取我的网站,然后使用网站中的数据填充我的应用程序中的元素,我的网站有登录页面,某些页面仅在登录完成后打开。
I started working with HtmlUnit as it is a headless browser and completed the custom api in a java IDE, later i tried to use the jar i generated from the java IDE and found that there are incompatibility issues with HtmlUnit and Android.
我开始使用HtmlUnit,因为它是一个无头浏览器,并在java IDE中完成了自定义api,后来我尝试使用我从java IDE生成的jar,发现HtmlUnit和Android存在不兼容问题。
Can anyone propose a solution to this problem?
任何人都可以提出解决这个问题的方法吗?
Edit :Since no one actually answered this question I am currently going with a work around using android's native WebView, settings its Visibility to invisible and then using javascript interfacing to a Java object, I can inject JS code to scrape any data.
编辑:由于没有人真正回答过这个问题,我目前正在尝试使用 android 的本机 WebView,将其可见性设置为不可见,然后使用 javascript 接口到 Java 对象,我可以注入 JS 代码来抓取任何数据。
回答by Fab
- Either you contribute to HtmlUnitto produce a version of HtmlUnitnot using the missing dependencies from Android.
- Or you can use an alternative method like this one, as this seems to be the path someone else go before you.
- 要么您为 HtmlUnit 做出贡献,以生成不使用Android 缺少的依赖项的HtmlUnit版本。
- 或者你也可以使用另一种方法是这样一个,因为这似乎是在你面前的路别人去。
If a real headlessbrowser able to manage any recent web features, would exist, it would mean a team would have developed it and then invest much effort in it (in supporting existing and coming features) consistently.
如果一个真正的无头浏览器能够管理任何最近的 Web 功能,这将意味着一个团队会开发它,然后持续投入大量精力(支持现有和即将推出的功能)。
Apart from Opera, Chrome, IE, and Firefox browsers, there is no such team. I would point out Chromium (CEF) as the most open and actively supported cross language wise. Try Cef for java
除了 Opera、Chrome、IE 和 Firefox 浏览器,没有这样的团队。我会指出 Chromium (CEF) 是最开放和最积极支持的跨语言明智的。试试Cef for Java
回答by Zeeshan Shabbir
Use Jsouplibrary for such purpose. Very handy and easy to use. Start with this answerand follow documents and other examples.