使用 phantomjs 获取 javascript 渲染的 html 源代码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28209509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get javascript rendered html source using phantomjs
提问by Anonymous Platypus
First of all, I am not looking for any help in development or testing environment. Also I am new to phantomjs and all I want is just the command line operation of phantomjs on linux terminal.
首先,我不是在寻求开发或测试环境的任何帮助。另外,我是 phantomjs 的新手,我想要的只是 phantomjs 在 linux 终端上的命令行操作。
I have an html page whose body is rendered by some javascript code. What I need is I wanted to download that rendered html content using phantomjs.
我有一个 html 页面,其正文由一些 javascript 代码呈现。我需要的是我想使用 phantomjs 下载呈现的 html 内容。
I don't have any idea using phantomjs. I have a bit of experience in shell scripting. So I have tried to do this with curl
. But as curl is not sufficient to render javascript, I was able to get the html of the default source code only. The rendered contents weren't downloaded. I heard that ruby mechanize may do this job. But I have no knowledge about ruby. So on further investigation I found the command line tool phantomjs
. How can I do this with phantomjs
?
我不知道使用phantomjs。我在 shell 脚本方面有一些经验。所以我试图用curl
. 但是由于 curl 不足以呈现 javascript,我只能获取默认源代码的 html。未下载呈现的内容。我听说红宝石机械化可以做这个工作。但我对红宝石一无所知。因此,在进一步调查中,我找到了命令行工具phantomjs
。我怎样才能做到这一点phantomjs
?
Please feel free to ask what all additional information do I need to provide.
请随时询问我需要提供哪些额外信息。
回答by Daniel Ma
Unfortunately, that is not possible using just the PhantomJS command line. You have to use a Javascript file to actually accomplish anything with PhantomJS.
不幸的是,仅使用 PhantomJS 命令行是不可能的。您必须使用 Javascript 文件才能使用 PhantomJS 实际完成任何事情。
Here is a very simple version of the script you can use
这是您可以使用的脚本的一个非常简单的版本
Code mostly copied from https://stackoverflow.com/a/12469284/4499924
代码大部分复制自https://stackoverflow.com/a/12469284/4499924
printSource.js
打印源.js
var system = require('system');
var page = require('webpage').create();
// system.args[0] is the filename, so system.args[1] is the first real argument
var url = system.args[1];
// render the page, and run the callback function
page.open(url, function () {
// page.content is the source
console.log(page.content);
// need to call phantom.exit() to prevent from hanging
phantom.exit();
});
To print the page source to standard out.
将页面源打印到标准输出。
phantomjs printSource.js http://todomvc.com/examples/emberjs/
phantomjs printSource.js http://todomvc.com/examples/emberjs/
To save the page source in a file
将页面源保存在文件中
phantomjs printSource.js http://todomvc.com/examples/emberjs/ > ember.html
phantomjs printSource.js http://todomvc.com/examples/emberjs/ > ember.html
回答by Firas Abd Alrahman
var pagehtml = page.evaluate("function() {"+
"return '<html><head>' + document.head.innerHTML + '</head>' + '<body>' + document.body.innerHTML + '</body></html>';" +
"}");
fs.write('output.html',pagehtml,'w');