java HtmlUnit 来查看源码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5996559/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
HtmlUnit to view source
提问by Jake Sankey
HtmlUnit for Java is great but I haven't been able to figure out how to view the full source or return the source of a web site as a string. can anyone help me with this?
适用于 Java 的 HtmlUnit 很棒,但我无法弄清楚如何查看完整源代码或将网站源代码作为字符串返回。谁能帮我这个?
I know the follow will read the site but now I just want to return the source to a string.
我知道后续会阅读该网站,但现在我只想将源返回到一个字符串。
HtmlPage mySite = webClient.getPage("http://mysite.com");
Thanks!
谢谢!
回答by Jeremy
回答by Jesse Webb
String pageSource = myPage.asXml();
That will get you the full HTML source of the web page.
这将为您提供网页的完整 HTML 源代码。
String pageText = myPage.asText();
That will get you all of the visible text on the page, including line breaks/white space. It would be the same if you were on the page in your browser and Ctrl+A
then Ctrl+V
into a variable.
这将使您获得页面上的所有可见文本,包括换行符/空格。如果您在浏览器中的页面上Ctrl+A
然后Ctrl+V
进入一个变量,它会是一样的。
回答by Kal
have you tried mySite.asXml()
? Or you can do mySite.getDocumentElement().toString()
你试过mySite.asXml()
吗?或者你可以做mySite.getDocumentElement().toString()