Javascript 如何在NodeJs的内存中下载和解压缩zip文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10359485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 01:01:30  来源:igfitidea点击:

How to download and unzip a zip file in memory in NodeJs?

javascriptnode.jszipzlibunzip

提问by pathikrit

I want to download a zip file from the internet and unzip it in memory without saving to a temporary file. How can I do this?

我想从 Internet 下载一个 zip 文件并将其解压缩到内存中而不保存到临时文件。我怎样才能做到这一点?

Here is what I tried:

这是我尝试过的:

var url = 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip';

var request = require('request'), fs = require('fs'), zlib = require('zlib');

  request.get(url, function(err, res, file) {
     if(err) throw err;
     zlib.unzip(file, function(err, txt) {
        if(err) throw err;
        console.log(txt.toString()); //outputs nothing
     });
  });

[EDIT] As, suggested, I tried using the adm-zip library and I still cannot make this work:

[编辑] 根据建议,我尝试使用 adm-zip 库,但仍然无法完成这项工作:

var ZipEntry = require('adm-zip/zipEntry');
request.get(url, function(err, res, zipFile) {
        if(err) throw err;
        var zip = new ZipEntry();
        zip.setCompressedData(new Buffer(zipFile.toString('utf-8')));
        var text = zip.getData();
        console.log(text.toString()); // fails
    });

回答by mihai

You need a library that can handle buffers. The latest version of adm-zipwill do:

您需要一个可以处理缓冲区的库。最新版本的adm-zip将做:

npm install adm-zip

My solution uses the http.getmethod, since it returns Buffer chunks.

我的解决方案使用该http.get方法,因为它返回 Buffer 块。

Code:

代码:

var file_url = 'http://notepad-plus-plus.org/repository/7.x/7.6/npp.7.6.bin.x64.zip';

var AdmZip = require('adm-zip');
var http = require('http');

http.get(file_url, function(res) {
  var data = [], dataLen = 0; 

  res.on('data', function(chunk) {
    data.push(chunk);
    dataLen += chunk.length;

  }).on('end', function() {
    var buf = Buffer.alloc(dataLen);

    for (var i = 0, len = data.length, pos = 0; i < len; i++) { 
      data[i].copy(buf, pos); 
      pos += data[i].length; 
    } 

    var zip = new AdmZip(buf);
    var zipEntries = zip.getEntries();
    console.log(zipEntries.length)

    for (var i = 0; i < zipEntries.length; i++) {
      if (zipEntries[i].entryName.match(/readme/))
        console.log(zip.readAsText(zipEntries[i]));
    }
  });
});

The idea is to create an array of buffers and concatenate them into a new one at the end. This is due to the fact that buffers cannot be resized.

这个想法是创建一个缓冲区数组,并在最后将它们连接成一个新的缓冲区。这是因为缓冲区不能调整大小。

Update

更新

This is a simpler solution that uses the requestmodule to obtain the response in a buffer, by setting encoding: nullin the options. It also follows redirects and resolves http/https automatically.

这是一个更简单的解决方案,它使用request模块通过encoding: null在选项中设置来获取缓冲区中的响应。它还遵循重定向并自动解析 http/https。

var file_url = 'https://github.com/mihaifm/linq/releases/download/3.1.1/linq.js-3.1.1.zip';

var AdmZip = require('adm-zip');
var request = require('request');

request.get({url: file_url, encoding: null}, (err, res, body) => {
  var zip = new AdmZip(body);
  var zipEntries = zip.getEntries();
  console.log(zipEntries.length);

  zipEntries.forEach((entry) => {
    if (entry.entryName.match(/readme/i))
      console.log(zip.readAsText(entry));
  });
});

The bodyof the response is a buffer that can be passed directly to AdmZip, simplifying the whole process.

body响应的是,可以直接传递到缓冲器AdmZip,简化了整个过程。

回答by kilianc

Sadly you can't pipethe response stream into the unzip job as node zliblib allows you to do, you have to cache and wait the end of the response. I suggest you to pipe the response to a fsstream in case of big files, otherwise you will full fill your memory in a blink!

遗憾的是,您无法将响应流通过管道传输到解压缩作业中,因为 node zliblib 允许您这样做,您必须缓存并等待响应结束。我建议你fs在大文件的情况下将响应通过管道传输到流,否则你会在眨眼间填满你的记忆!

I don't completely understand what you are trying to do, but imho this is the best approach. You should keep your data in memory only the time you really need it, and then stream to the csv parser.

我不完全理解你想要做什么,但恕我直言,这是最好的方法。您应该只在真正需要时才将数据保存在内存中,然后流式传输到csv 解析器

If you want to keep all your data in memory you can replace the csv parser method fromPathwith fromthat takes a buffer instead and in getData return directly unzipped

如果您想将所有数据保存在内存中,您可以将 csv 解析器方法替换为采用缓冲区的方法fromPathfrom并在 getData 中直接返回unzipped

You can use the AMDZip(as @mihai said) instead of node-zip, just pay attention because AMDZipis not yet published in npm so you need:

您可以使用AMDZip(如@mihai 所说)代替node-zip,请注意,因为AMDZip尚未在 npm 中发布,因此您需要:

$ npm install git://github.com/cthackers/adm-zip.git

N.B. Assumption: the zip file contains only one file

NB 假设:zip 文件只包含一个文件

var request = require('request'),
    fs = require('fs'),
    csv = require('csv')
    NodeZip = require('node-zip')

function getData(tmpFolder, url, callback) {
  var tempZipFilePath = tmpFolder + new Date().getTime() + Math.random()
  var tempZipFileStream = fs.createWriteStream(tempZipFilePath)
  request.get({
    url: url,
    encoding: null
  }).on('end', function() {
    fs.readFile(tempZipFilePath, 'base64', function (err, zipContent) {
      var zip = new NodeZip(zipContent, { base64: true })
      Object.keys(zip.files).forEach(function (filename) {
        var tempFilePath = tmpFolder + new Date().getTime() + Math.random()
        var unzipped = zip.files[filename].data
        fs.writeFile(tempFilePath, unzipped, function (err) {
          callback(err, tempFilePath)
        })
      })
    })
  }).pipe(tempZipFileStream)
}

getData('/tmp/', 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip', function (err, path) {
  if (err) {
    return console.error('error: %s' + err.message)
  }
  var metadata = []
  csv().fromPath(path, {
    delimiter: '|',
    columns: true
  }).transform(function (data){
    // do things with your data
    if (data.NAME[0] === '#') {
      metadata.push(data.NAME)
    } else {
      return data
    }
  }).on('data', function (data, index) {
    console.log('#%d %s', index, JSON.stringify(data, null, '  '))
  }).on('end',function (count) {
    console.log('Metadata: %s', JSON.stringify(metadata, null, '  '))
    console.log('Number of lines: %d', count)
  }).on('error', function (error) {
    console.error('csv parsing error: %s', error.message)
  })
})

回答by enyo

If you're under MacOS or Linux, you can use the unzipcommand to unzip from stdin.

如果您使用的是 MacOS 或 Linux,则可以使用该unzip命令从stdin.

In this example I'm reading the zip file from the filesystem into a Bufferobject but it works with a downloaded file as well:

在这个例子中,我将文件系统中的 zip 文件读入一个Buffer对象,但它也适用于下载的文件:

// Get a Buffer with the zip content
var fs = require("fs")
  , zip = fs.readFileSync(__dirname + "/test.zip");


// Now the actual unzipping:
var spawn = require('child_process').spawn
  , fileToExtract = "test.js"
    // -p tells unzip to extract to stdout
  , unzip = spawn("unzip", ["-p", "/dev/stdin", fileToExtract ])
  ;

// Write the Buffer to stdin
unzip.stdin.write(zip);

// Handle errors
unzip.stderr.on('data', function (data) {
  console.log("There has been an error: ", data.toString("utf-8"));
});

// Handle the unzipped stdout
unzip.stdout.on('data', function (data) {
  console.log("Unzipped file: ", data.toString("utf-8"));
});

unzip.stdin.end();

Which is actually just the node version of:

这实际上只是以下的节点版本:

cat test.zip | unzip -p /dev/stdin test.js

EDIT: It's worth noting that this will not work if the input zip is too big to be read in one chunk from stdin. If you need to read bigger files, and your zip file contains only one file, you can use funzipinstead of unzip:

编辑:值得注意的是,如果输入 zip 太大而无法从 stdin 中读取一个块,这将不起作用。如果您需要读取更大的文件,而您的 zip 文件只包含一个文件,您可以使用funzip代替unzip

var unzip = spawn("funzip");

If your zip file contains multiple files (and the file you want isn't the first one) I'm afraid to say you're out of luck. Unzip needs to seek in the .zipfile since zip files are just a container, and unzip may just unzip the last file in it. In that case you have to save the file temporarily (node-tempcomes in handy).

如果您的 zip 文件包含多个文件(并且您想要的文件不是第一个),我恐怕会说您不走运。.zipunzip需要在文件中查找,因为 zip 文件只是一个容器,而 unzip 可能只是解压缩其中的最后一个文件。在这种情况下,您必须临时保存文件(node-temp派上用场)。

回答by enyo

Two days ago the module node-ziphas been released, which is a wrapper for the JavaScript only version of Zip: JSZip.

两天前,该模块node-zip已经发布,它是 Zip 的JavaScript 版本的包装器:JSZip

var NodeZip = require('node-zip')
  , zip = new NodeZip(zipBuffer.toString("base64"), { base64: true })
  , unzipped = zip.files["your-text-file.txt"].data;