javascript 在单个脚本中使用 Multiple page.open

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16996732/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-27 06:43:06  来源:igfitidea点击:

Using Multiple page.open in Single Script

javascriptphantomjs

提问by asprin

My goal is to execute PhantomJS by using:

我的目标是使用以下方法执行 PhantomJS:

// adding $op and $er for debugging purposes
exec('phantomjs script.js', $op, $er);
print_r($op);
echo $er;

And then inside script.js, I plan to use multiple page.open()to capture screenshots of different pages such as:

然后在里面script.js,我计划使用多个page.open()来捕获不同页面的屏幕截图,例如:

var url = 'some dynamic url goes here';
page = require('webpage').create();
page.open(url, function (status) {
    console.log('opening page 1');  
    page.render('./slide1.png');            
});

page = require('webpage').create();
page.open(url, function (status) {
    console.log('opening page 2');  
    page.render('./slide2.png');        
});

page = require('webpage').create();
page.open(url, function (status) {
    console.log('opening page 3');  
    page.render('./slide3.png');        
    phantom.exit(); //<-- Exiting phantomJS only after opening all 3 pages
});

On running exec, I get the following output on page:

在运行时exec,我在页面上得到以下输出:

Array ( [0] => opening page 3 ) 0

As a result I only get the screenshot of the 3rd page. I'm not sure why PhantomJS is skipping the first and second blocks of code (evident from the missing console.log()messages that were supposed to be output from 1st and 2nd block) and only executing the third block of code.

结果我只得到了第三页的截图。我不知道为什么 PhantomJS 跳过第一和第二个代码块(从console.log()应该从第一个和第二个块输出的丢失的消息中可以看出)并且只执行第三个代码块。

回答by

The problem is that the second page.openis being invoked before the first one finishes, which can cause multiple problems. You want logic roughly like the following (assuming the filenames are given as command line arguments):

问题是第二个page.open在第一个完成之前被调用,这可能会导致多个问题。您需要大致如下的逻辑(假设文件名作为命令行参数给出):

function handle_page(file){
    page.open(file,function(){
        ...
        page.evaluate(function(){
            ...do stuff...
        });
        page.render(...);
        setTimeout(next_page,100);
    });
}
function next_page(){
    var file=args.shift();
    if(!file){phantom.exit(0);}
    handle_page(file);
}
next_page();

Right, it's recursive. This ensures that the processing of the function passed to page.openfinishes, with a little 100ms grace period, before you go to the next file.

没错,它是递归的。这可确保传递给的函数的处理page.open完成,并有 100 毫秒的宽限期,然后再转到下一个文件。

By the way, you don't need to keep repeating

顺便说一句,你不需要一直重复

page = require('webpage').create();

回答by HymanyJohnson

I've tried the accepted answer suggestions, but it doesn't work (at least not for v2.1.1).

我已经尝试了接受的答案建议,但它不起作用(至少不适用于 v2.1.1)。

To be accurate the accepted answer worked some of the time, but I still experienced sporadic failed page.open() calls, about 90% of the time on specific data sets.

准确地说,接受的答案在某些时候是有效的,但我仍然遇到了偶尔失败的 page.open() 调用,大约 90% 的时间是在特定数据集上。

The simplest answer I found is to instantiate a new page module for each url.

我找到的最简单的答案是为每个 url 实例化一个新的页面模块。

// first page
var urlA = "http://first/url"
var pageA = require('webpage').create()

pageA.open(urlA, function(status){
    if (status){
        setTimeout(openPageB, 100) // open second page call
    } else{
        phantom.exit(1)
    }
})

// second page
var urlB = "http://second/url"
var pageB = require('webpage').create()

function openPageB(){
    pageB.open(urlB, function(){
        // ... 
        // ...
    })
}

The following from the page module api documentation on the close method says:

页面模块 api 文档中关于 close 方法的以下内容说

close() {void}

Close the page and releases the memory heap associated with it. Do not use the page instance after calling this.

Due to some technical limitations, the web page object might not be completely garbage collected. This is often encountered when the same object is used over and over again. Calling this function may stop the increasing heap allocation.

关闭(){无效}

关闭页面并释放与之关联的内存堆。调用此后不要使用页面实例。

由于某些技术限制,网页对象可能不会被完全垃圾收集。当反复使用同一个对象时,经常会遇到这种情况。调用此函数可能会停止增加的堆分配。

Basically after I tested the close() method I decided using the same web page instance for different open() calls is too unreliable and it needed to be said.

基本上在我测试了 close() 方法之后,我决定对不同的 open() 调用使用相同的网页实例太不可靠了,需要说明一下。

回答by sidanmor

You can use recursion:

您可以使用递归:

var page = require('webpage').create();

// the urls to navigate to
var urls = [
    'http://phantomjs.org/',
    'https://twitter.com/sidanmor',
    'https://github.com/sidanmor'
];

var i = 0;

// the recursion function
var genericCallback = function () {
    return function (status) {
        console.log("URL: " + urls[i]);
        console.log("Status: " + status);
        // exit if there was a problem with the navigation
        if (!status || status === 'fail') phantom.exit();

        i++;

        if (status === "success") {

            //-- YOUR STUFF HERE ---------------------- 
            // do your stuff here... I'm taking a picture of the page
            page.render('example' + i + '.png');
            //-----------------------------------------

            if (i < urls.length) {
                // navigate to the next url and the callback is this function (recursion)
                page.open(urls[i], genericCallback());
            } else {
                // try navigate to the next url (it is undefined because it is the last element) so the callback is exit
                page.open(urls[i], function () {
                    phantom.exit();
                });
            }
        }
    };
};

// start from the first url
page.open(urls[i], genericCallback());

回答by froilanq

Using Queued Processes, sample:

使用排队进程,示例:

var page = require('webpage').create();

// Queue Class Helper
var Queue = function() {
    this._tasks = [];
};
Queue.prototype.add = function(fn, scope) {
    this._tasks.push({fn: fn,scope: scope});
    return this;
};
Queue.prototype.process = function() {
    var proxy, self = this;
    task = this._tasks.shift();
    if(!task) {return;}
    proxy = {end: function() {self.process();}};
    task.fn.call(task.scope, proxy);
    return this;        
};
Queue.prototype.clear = function() {
    this._tasks = []; return this;
};

// Init pages .....  
var q = new Queue();       

q.add(function(proxy) {
  page.open(url1, function() {
    // page.evaluate
    proxy.end();
  });            
});

q.add(function(proxy) {
  page.open(url2, function() {
    // page.evaluate
    proxy.end();
  });            
});


q.add(function(proxy) {
  page.open(urln, function() {
    // page.evaluate
    proxy.end();
  });            
});

// .....

q.add(function(proxy) {
  phantom.exit()
  proxy.end();
});

q.process();

I hope this is useful, regards.

我希望这是有用的,问候。