Javascript 如何在NodeJs的内存中下载和解压缩zip文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10359485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to download and unzip a zip file in memory in NodeJs?
提问by pathikrit
I want to download a zip file from the internet and unzip it in memory without saving to a temporary file. How can I do this?
我想从 Internet 下载一个 zip 文件并将其解压缩到内存中而不保存到临时文件。我怎样才能做到这一点?
Here is what I tried:
这是我尝试过的:
var url = 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip';
var request = require('request'), fs = require('fs'), zlib = require('zlib');
request.get(url, function(err, res, file) {
if(err) throw err;
zlib.unzip(file, function(err, txt) {
if(err) throw err;
console.log(txt.toString()); //outputs nothing
});
});
[EDIT] As, suggested, I tried using the adm-zip library and I still cannot make this work:
[编辑] 根据建议,我尝试使用 adm-zip 库,但仍然无法完成这项工作:
var ZipEntry = require('adm-zip/zipEntry');
request.get(url, function(err, res, zipFile) {
if(err) throw err;
var zip = new ZipEntry();
zip.setCompressedData(new Buffer(zipFile.toString('utf-8')));
var text = zip.getData();
console.log(text.toString()); // fails
});
回答by mihai
You need a library that can handle buffers. The latest version of adm-zip
will do:
您需要一个可以处理缓冲区的库。最新版本的adm-zip
将做:
npm install adm-zip
My solution uses the http.get
method, since it returns Buffer chunks.
我的解决方案使用该http.get
方法,因为它返回 Buffer 块。
Code:
代码:
var file_url = 'http://notepad-plus-plus.org/repository/7.x/7.6/npp.7.6.bin.x64.zip';
var AdmZip = require('adm-zip');
var http = require('http');
http.get(file_url, function(res) {
var data = [], dataLen = 0;
res.on('data', function(chunk) {
data.push(chunk);
dataLen += chunk.length;
}).on('end', function() {
var buf = Buffer.alloc(dataLen);
for (var i = 0, len = data.length, pos = 0; i < len; i++) {
data[i].copy(buf, pos);
pos += data[i].length;
}
var zip = new AdmZip(buf);
var zipEntries = zip.getEntries();
console.log(zipEntries.length)
for (var i = 0; i < zipEntries.length; i++) {
if (zipEntries[i].entryName.match(/readme/))
console.log(zip.readAsText(zipEntries[i]));
}
});
});
The idea is to create an array of buffers and concatenate them into a new one at the end. This is due to the fact that buffers cannot be resized.
这个想法是创建一个缓冲区数组,并在最后将它们连接成一个新的缓冲区。这是因为缓冲区不能调整大小。
Update
更新
This is a simpler solution that uses the request
module to obtain the response in a buffer, by setting encoding: null
in the options. It also follows redirects and resolves http/https automatically.
这是一个更简单的解决方案,它使用request
模块通过encoding: null
在选项中设置来获取缓冲区中的响应。它还遵循重定向并自动解析 http/https。
var file_url = 'https://github.com/mihaifm/linq/releases/download/3.1.1/linq.js-3.1.1.zip';
var AdmZip = require('adm-zip');
var request = require('request');
request.get({url: file_url, encoding: null}, (err, res, body) => {
var zip = new AdmZip(body);
var zipEntries = zip.getEntries();
console.log(zipEntries.length);
zipEntries.forEach((entry) => {
if (entry.entryName.match(/readme/i))
console.log(zip.readAsText(entry));
});
});
The body
of the response is a buffer that can be passed directly to AdmZip
, simplifying the whole process.
的body
响应的是,可以直接传递到缓冲器AdmZip
,简化了整个过程。
回答by kilianc
Sadly you can't pipethe response stream into the unzip job as node zlib
lib allows you to do, you have to cache and wait the end of the response. I suggest you to pipe the response to a fs
stream in case of big files, otherwise you will full fill your memory in a blink!
遗憾的是,您无法将响应流通过管道传输到解压缩作业中,因为 node zlib
lib 允许您这样做,您必须缓存并等待响应结束。我建议你fs
在大文件的情况下将响应通过管道传输到流,否则你会在眨眼间填满你的记忆!
I don't completely understand what you are trying to do, but imho this is the best approach. You should keep your data in memory only the time you really need it, and then stream to the csv parser.
我不完全理解你想要做什么,但恕我直言,这是最好的方法。您应该只在真正需要时才将数据保存在内存中,然后流式传输到csv 解析器。
If you want to keep all your data in memory you can replace the csv parser method fromPath
with from
that takes a buffer instead and in getData return directly unzipped
如果您想将所有数据保存在内存中,您可以将 csv 解析器方法替换为采用缓冲区的方法fromPath
,from
并在 getData 中直接返回unzipped
You can use the AMDZip
(as @mihai said) instead of node-zip
, just pay attention because AMDZip
is not yet published in npm so you need:
您可以使用AMDZip
(如@mihai 所说)代替node-zip
,请注意,因为AMDZip
尚未在 npm 中发布,因此您需要:
$ npm install git://github.com/cthackers/adm-zip.git
N.B. Assumption: the zip file contains only one file
NB 假设:zip 文件只包含一个文件
var request = require('request'),
fs = require('fs'),
csv = require('csv')
NodeZip = require('node-zip')
function getData(tmpFolder, url, callback) {
var tempZipFilePath = tmpFolder + new Date().getTime() + Math.random()
var tempZipFileStream = fs.createWriteStream(tempZipFilePath)
request.get({
url: url,
encoding: null
}).on('end', function() {
fs.readFile(tempZipFilePath, 'base64', function (err, zipContent) {
var zip = new NodeZip(zipContent, { base64: true })
Object.keys(zip.files).forEach(function (filename) {
var tempFilePath = tmpFolder + new Date().getTime() + Math.random()
var unzipped = zip.files[filename].data
fs.writeFile(tempFilePath, unzipped, function (err) {
callback(err, tempFilePath)
})
})
})
}).pipe(tempZipFileStream)
}
getData('/tmp/', 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip', function (err, path) {
if (err) {
return console.error('error: %s' + err.message)
}
var metadata = []
csv().fromPath(path, {
delimiter: '|',
columns: true
}).transform(function (data){
// do things with your data
if (data.NAME[0] === '#') {
metadata.push(data.NAME)
} else {
return data
}
}).on('data', function (data, index) {
console.log('#%d %s', index, JSON.stringify(data, null, ' '))
}).on('end',function (count) {
console.log('Metadata: %s', JSON.stringify(metadata, null, ' '))
console.log('Number of lines: %d', count)
}).on('error', function (error) {
console.error('csv parsing error: %s', error.message)
})
})
回答by enyo
If you're under MacOS or Linux, you can use the unzip
command to unzip from stdin
.
如果您使用的是 MacOS 或 Linux,则可以使用该unzip
命令从stdin
.
In this example I'm reading the zip file from the filesystem into a Buffer
object but it works
with a downloaded file as well:
在这个例子中,我将文件系统中的 zip 文件读入一个Buffer
对象,但它也适用于下载的文件:
// Get a Buffer with the zip content
var fs = require("fs")
, zip = fs.readFileSync(__dirname + "/test.zip");
// Now the actual unzipping:
var spawn = require('child_process').spawn
, fileToExtract = "test.js"
// -p tells unzip to extract to stdout
, unzip = spawn("unzip", ["-p", "/dev/stdin", fileToExtract ])
;
// Write the Buffer to stdin
unzip.stdin.write(zip);
// Handle errors
unzip.stderr.on('data', function (data) {
console.log("There has been an error: ", data.toString("utf-8"));
});
// Handle the unzipped stdout
unzip.stdout.on('data', function (data) {
console.log("Unzipped file: ", data.toString("utf-8"));
});
unzip.stdin.end();
Which is actually just the node version of:
这实际上只是以下的节点版本:
cat test.zip | unzip -p /dev/stdin test.js
EDIT: It's worth noting that this will not work if the input zip is too big to be read in one chunk from stdin. If you need to read bigger files, and your zip file contains only one file, you can use funzipinstead of unzip
:
编辑:值得注意的是,如果输入 zip 太大而无法从 stdin 中读取一个块,这将不起作用。如果您需要读取更大的文件,而您的 zip 文件只包含一个文件,您可以使用funzip代替unzip
:
var unzip = spawn("funzip");
If your zip file contains multiple files (and the file you want isn't the first one) I'm afraid to say you're out of luck. Unzip needs to seek in the .zip
file since zip files are just a container, and unzip may just unzip the last file in it. In that case you have to save the file temporarily (node-tempcomes in handy).
如果您的 zip 文件包含多个文件(并且您想要的文件不是第一个),我恐怕会说您不走运。.zip
unzip需要在文件中查找,因为 zip 文件只是一个容器,而 unzip 可能只是解压缩其中的最后一个文件。在这种情况下,您必须临时保存文件(node-temp派上用场)。
回答by enyo
Two days ago the module node-zip
has been released, which is a wrapper for the JavaScript only version of Zip: JSZip.
两天前,该模块node-zip
已经发布,它是 Zip 的纯JavaScript 版本的包装器:JSZip。
var NodeZip = require('node-zip')
, zip = new NodeZip(zipBuffer.toString("base64"), { base64: true })
, unzipped = zip.files["your-text-file.txt"].data;