wget + JavaScript?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5901661/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
wget + JavaScript?
提问by Jake Wilson
I have this webpage that uses client-side JavaScript to format data on the page before it's displayed to the user.
我有这个网页,它在显示给用户之前使用客户端 JavaScript 来格式化页面上的数据。
Is it possible to somehow use wget
to download the page and use some sort of client-side JavaScript engine to format the data as it would be displayed in a browser?
是否有可能以某种方式用于wget
下载页面并使用某种客户端 JavaScript 引擎来格式化数据,就像它在浏览器中显示的那样?
采纳答案by Alex Wayne
You could probably make that happen with something like PhantomJS
您可能可以使用PhantomJS 之类的东西来实现这一点
You can write a phantomjs script that will load the page like a browser would, and then either take screenshots or use JS to inspect the page and pull out data.
您可以编写一个 phantomjs 脚本,它会像浏览器一样加载页面,然后截取屏幕截图或使用 JS 来检查页面并提取数据。
回答by Alex Wayne
Here is a simple little phantomjs script that triggers javascript on a webpage and allows you to pull it down locally:
这是一个简单的小 phantomjs 脚本,它在网页上触发 javascript 并允许您在本地将其拉下来:
file: get.js
file: get.js
var page = require('webpage').create(),
system = require('system'), address;
address = system.args[1];
page.scrollPosition= { top: 4000, left: 0}
page.open(address, function(status) {
if (status !== 'success') {
console.log('** Error loading url.');
} else {
console.log(page.content);
}
phantom.exit();
});
Use it as follows: $> phantomjs /path/to/get.js "http://www.google.com" > "google.html"
使用方法如下: $> phantomjs /path/to/get.js "http://www.google.com" > "google.html"
Changing /path/to
, url
and filename
to what you want.
改变/path/to
,url
和filename
你想要的。
回答by drowe
Not with wget, as I doubt it includes any form of a JavaScript engine. However, you could use WebKit to process the page, and thus the output.
不是 wget,因为我怀疑它包含任何形式的 JavaScript 引擎。但是,您可以使用 WebKit 来处理页面,从而处理输出。
Using things like this as a base for how to get the content: http://situated.wordpress.com/2008/06/04/take-screenshots-of-a-website-from-the-command-line/
使用这样的东西作为如何获取内容的基础:http: //situated.wordpress.com/2008/06/04/take-screenshots-of-a-website-from-the-command-line/