使用 phantomjs 获取 javascript 渲染的 html 源代码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28209509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-28 08:35:46  来源:igfitidea点击:

Get javascript rendered html source using phantomjs

javascripthtmlshellphantomjs

提问by Anonymous Platypus

First of all, I am not looking for any help in development or testing environment. Also I am new to phantomjs and all I want is just the command line operation of phantomjs on linux terminal.

首先,我不是在寻求开发或测试环境的任何帮助。另外,我是 phantomjs 的新手,我想要的只是 phantomjs 在 linux 终端上的命令行操作。

I have an html page whose body is rendered by some javascript code. What I need is I wanted to download that rendered html content using phantomjs.

我有一个 html 页面,其正文由一些 javascript 代码呈现。我需要的是我想使用 phantomjs 下载呈现的 html 内容。

I don't have any idea using phantomjs. I have a bit of experience in shell scripting. So I have tried to do this with curl. But as curl is not sufficient to render javascript, I was able to get the html of the default source code only. The rendered contents weren't downloaded. I heard that ruby mechanize may do this job. But I have no knowledge about ruby. So on further investigation I found the command line tool phantomjs. How can I do this with phantomjs?

我不知道使用phantomjs。我在 shell 脚本方面有一些经验。所以我试图用curl. 但是由于 curl 不足以呈现 javascript,我只能获取默认源代码的 html。未下载呈现的内容。我听说红宝石机械化可以做这个工作。但我对红宝石一无所知。因此,在进一步调查中,我找到了命令行工具phantomjs。我怎样才能做到这一点phantomjs

Please feel free to ask what all additional information do I need to provide.

请随时询问我需要提供哪些额外信息。

回答by Daniel Ma

Unfortunately, that is not possible using just the PhantomJS command line. You have to use a Javascript file to actually accomplish anything with PhantomJS.

不幸的是,仅使用 PhantomJS 命令行是不可能的。您必须使用 Javascript 文件才能使用 PhantomJS 实际完成任何事情。

Here is a very simple version of the script you can use

这是您可以使用的脚本的一个非常简单的版本

Code mostly copied from https://stackoverflow.com/a/12469284/4499924

代码大部分复制自https://stackoverflow.com/a/12469284/4499924

printSource.js

打印源.js

var system = require('system');
var page   = require('webpage').create();
// system.args[0] is the filename, so system.args[1] is the first real argument
var url    = system.args[1];
// render the page, and run the callback function
page.open(url, function () {
  // page.content is the source
  console.log(page.content);
  // need to call phantom.exit() to prevent from hanging
  phantom.exit();
});

To print the page source to standard out.

将页面源打印到标准输出。

phantomjs printSource.js http://todomvc.com/examples/emberjs/

phantomjs printSource.js http://todomvc.com/examples/emberjs/

To save the page source in a file

将页面源保存在文件中

phantomjs printSource.js http://todomvc.com/examples/emberjs/ > ember.html

phantomjs printSource.js http://todomvc.com/examples/emberjs/ > ember.html

回答by Firas Abd Alrahman

var pagehtml = page.evaluate("function() {"+ 
  "return '<html><head>' + document.head.innerHTML + '</head>' + '<body>' + document.body.innerHTML + '</body></html>';" + 
"}");


fs.write('output.html',pagehtml,'w');