node.js 如何使用 headless: true 使用 puppeteer 下载文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49245080/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 18:43:32  来源:igfitidea点击:

How to download file with puppeteer using headless: true?

node.jschromiumpuppeteer

提问by Antonio Gomez Alvarado

I've been running the following code in order to download a csvfile from the website http://niftyindices.com/resources/holiday-calendar:

我一直在运行以下代码以csv从网站下载文件http://niftyindices.com/resources/holiday-calendar

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();

await page.goto('http://niftyindices.com/resources/holiday-calendar');
await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', 
downloadPath: '/tmp'})
await page.click('#exportholidaycalender');
await page.waitFor(5000);
await browser.close();
})();

with headless: falseit works, it downloads the file into /Users/user/Downloads. with headless: trueit does NOT work.

有了headless: false它,它会将文件下载到/Users/user/Downloads. 与headless: true它不工作。

I'm running this on a macOS Sierra (MacBook Pro) using puppeteer version 1.1.1which pulls Chromium version 66.0.3347.0into .local-chromium/directory and used npm initand npm i --save puppeteerto set it up.

我正在使用 puppeteer 版本在 macOS Sierra(MacBook Pro)上运行它,该版本1.1.1将 Chromium 版本拉66.0.3347.0.local-chromium/目录并使用npm initnpm i --save puppeteer进行设置。

Any idea whats wrong?

知道出了什么问题吗?

Thanks in advance for your time and help,

提前感谢您的时间和帮助,

采纳答案by Sumit Mishra

This page downloads a csv by creating a comma delimited string and forcing the browser to download it by setting the data type like so

此页面通过创建逗号分隔的字符串并通过设置数据类型强制浏览器下载它来下载 csv

let uri = "data:text/csv;charset=utf-8," + encodeURIComponent(content);
window.open(uri, "Some CSV");

This on chrome opens a new tab.

这在 chrome 上打开一个新标签。

You can tap into this event and physically download the contents into a file. Not sure if this is the best way but works well.

您可以点击此事件并将内容物理下载到文件中。不确定这是否是最好的方法,但效果很好。

const browser = await puppeteer.launch({
  headless: true
});
browser.on('targetcreated', async (target) => {
    let s = target.url();
    //the test opens an about:blank to start - ignore this
    if (s == 'about:blank') {
        return;
    }
    //unencode the characters after removing the content type
    s = s.replace("data:text/csv;charset=utf-8,", "");
    //clean up string by unencoding the %xx
    ...
    fs.writeFile("/tmp/download.csv", s, function(err) {
        if(err) {
            console.log(err);
            return;
        }
        console.log("The file was saved!");
    }); 
});

const page = await browser.newPage();
.. open link ...
.. click on download link ..

回答by MyCompassSpins

I spent hours poring through this threadand Stack Overflow yesterday, trying to figure out how to get Puppeteer to download a csv file by clicking a download link in headless mode in an authenticated session. The accepted answer here didn't work in my case because the download does not trigger targetcreated, and the next answer, for whatever reason, did not retain the authenticated session. This articlesaved the day. In short, fetch. Hopefully this helps someone else out.

我昨天花了几个小时仔细研究这个线程和 Stack Overflow,试图弄清楚如何通过在经过身份验证的会话中以无头模式单击下载链接来让 Puppeteer 下载 csv 文件。此处接受的答案在我的情况下不起作用,因为下载不会触发targetcreated,并且下一个答案,无论出于何种原因,都没有保留经过身份验证的会话。这篇文章拯救了这一天。简而言之,fetch。希望这可以帮助其他人。

const res = await this.page.evaluate(() =>
{
    return fetch('https://example.com/path/to/file.csv', {
        method: 'GET',
        credentials: 'include'
    }).then(r => r.text());
});

回答by Juan Carlos Migliavacca

The problem is that the browser closes before download finished.

问题是浏览器在下载完成之前关闭。

You can get the filesize and the name of the file from the response, and then use a watch script to check filesize from downloaded file, in order to close the browser.

您可以从响应中获取文件大小和文件名,然后使用监视脚本从下载的文件中检查文件大小,以关闭浏览器。

This is an example:

这是一个例子:

const filename = <set this with some regex in response>;
const dir = <watch folder or file>;

// Download and wait for download
    await Promise.all([
        page.click('#DownloadFile'),
       // Event on all responses
        page.on('response', response => {
            // If response has a file on it
            if (response._headers['content-disposition'] === `attachment;filename=${filename}`) {
               // Get the size
                console.log('Size del header: ', response._headers['content-length']);
                // Watch event on download folder or file
                 fs.watchFile(dir, function (curr, prev) {
                   // If current size eq to size from response then close
                    if (parseInt(curr.size) === parseInt(response._headers['content-length'])) {
                        browser.close();
                        this.close();
                    }
                });
            }
        })
    ]);

Even that the way of searching in response can be improved though I hope you'll find this usefull.

即使可以改进搜索响应的方式,但我希望您会发现这很有用。

回答by Russell Elfenbein

I have another solution to this problem, since none of the answers here worked for me.

我有另一个解决这个问题的方法,因为这里的答案都不适合我。

I needed to log into a website, and download some .csv reports. Headed was fine, headless failed no matter what I tried. Looking at the Network errors, the download is aborted, but I couldn't (quickly) determine why.

我需要登录一个网站,然后下载一些 .csv 报告。无论我尝试什么,Headed 都很好,Headed 失败了。查看网络错误,下载已中止,但我无法(快速)确定原因。

So, I intercepted the requests and used node-fetch to make the request outside of puppeteer. This required copying the fetch options, body, headers and adding in the access cookie.

因此,我拦截了请求并使用 node-fetch 在 puppeteer 之外发出请求。这需要复制获取选项、正文、标题并添加访问 cookie。

Good luck.

祝你好运。

回答by Andrey Shorin

I found a way to wait for browser capability to download a file. The idea is to wait for response with predicate. In my case URL ends with '/data'.

我找到了一种等待浏览器功能下载文件的方法。这个想法是用谓词等待响应。在我的例子中,URL 以“/data”结尾。

I just didn't like to load file contents into buffer.

我只是不喜欢将文件内容加载到缓冲区中。

await page._client.send('Page.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: download_path,
});

await frame.focus(report_download_selector);
await Promise.all([
    page.waitForResponse(r => r.url().endsWith('/data')),
    page.keyboard.press('Enter'),
]);

回答by Jason Norwood-Young

I needed to download a file from behind a login, which was being handled by Puppeteer. targetcreatedwas not being triggered. In the end I downloaded with request, after copying the cookies over from the Puppeteer instance.

我需要从登录后下载一个文件,这是由 Puppeteer 处理的。targetcreated没有被触发。最后,我request从 Puppeteer 实例复制了 cookie 后使用下载。

In this case, I'm streaming the file through, but you could just as easily save it.

在这种情况下,我正在流式传输文件,但您也可以轻松保存它。

    res.writeHead(200, {
        "Content-Type": 'application/octet-stream',
        "Content-Disposition": `attachment; filename=secretfile.jpg`
    });
    let cookies = await page.cookies();
    let jar = request.jar();
    for (let cookie of cookies) {
        jar.setCookie(`${cookie.name}=${cookie.value}`, "http://secretsite.com");
    }
    try {
        var response = await request({ url: "http://secretsite.com/secretfile.jpg", jar }).pipe(res);
    } catch(err) {
        console.trace(err);
        return res.send({ status: "error", message: err });
    }