使用 Node.js 编写大文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9486683/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 15:15:12  来源:igfitidea点击:

Writing large files with Node.js

node.jslarge-files

提问by nab

I'm writing a large file with node.js using a writable stream:

我正在使用可写流使用 node.js 编写一个大文件:

var fs     = require('fs');
var stream = fs.createWriteStream('someFile.txt', { flags : 'w' });

var lines;
while (lines = getLines()) {
    for (var i = 0; i < lines.length; i++) {
        stream.write( lines[i] );
    }
}

I'm wondering if this scheme is safe without using drainevent? If it is not (which I think is the case), what is the pattern for writing an arbitrary large data to a file?

我想知道这个方案在不使用drain事件的情况下是否安全?如果不是(我认为是这种情况),将任意大数据写入文件的模式是什么?

采纳答案by nab

That's how I finally did it. The idea behind is to create readable stream implementing ReadStreaminterface and then use pipe()method to pipe data to writable stream.

这就是我最终做到的。背后的想法是创建实现ReadStream接口的可读流,然后使用pipe()方法将数据通过管道传输到可写流。

var fs = require('fs');
var writeStream = fs.createWriteStream('someFile.txt', { flags : 'w' });
var readStream = new MyReadStream();

readStream.pipe(writeStream);
writeStream.on('close', function () {
    console.log('All done!');
});

The example of MyReadStreamclass can be taken from mongoose QueryStream.

MyReadStream类的示例可以取自 mongoose QueryStream

回答by jcolebrand

The idea behind drain is that you would use it to test here:

Drain 背后的想法是你可以用它来测试这里:

var fs = require('fs');
var stream = fs.createWriteStream('someFile.txt', {flags: 'w'});

var lines;
while (lines = getLines()) {
    for (var i = 0; i < lines.length; i++) {
        stream.write(lines[i]); //<-- the place to test
    }
}

which you're not. So you would need to rearchitect to make it "reentrant".

你不是。因此,您需要重新架构以使其“可重入”。

var fs = require('fs');
var stream = fs.createWriteStream('someFile.txt', {flags: 'w'});

var lines;
while (lines = getLines()) {
    for (var i = 0; i < lines.length; i++) {
        var written = stream.write(lines[i]); //<-- the place to test
        if (!written){
           //do something here to wait till you can safely write again
           //this means prepare a buffer and wait till you can come back to finish
           //  lines[i] -> remainder
        }
    }
}

However, does this mean that you need to keep buffering getLines as well while you wait?

但是,这是否意味着您在等待时还需要继续缓冲 getLines?

var fs = require('fs');
var stream = fs.createWriteStream('someFile.txt', {flags: 'w'});

var lines,
    buffer = {
     remainingLines = []
    };
while (lines = getLines()) {
    for (var i = 0; i < lines.length; i++) {
        var written = stream.write(lines[i]); //<-- the place to test
        if (!written){
           //do something here to wait till you can safely write again
           //this means prepare a buffer and wait till you can come back to finish
           //  lines[i] -> remainder
           buffer.remainingLines = lines.slice(i);
           break;
           //notice there's no way to re-run this once we leave here.
        }
    }
}

stream.on('drain',function(){
  if (buffer.remainingLines.length){
    for (var i = 0; i < buffer.remainingLines.length; i++) {
      var written = stream.write(buffer.remainingLines[i]); //<-- the place to test
      if (!written){
       //do something here to wait till you can safely write again
       //this means prepare a buffer and wait till you can come back to finish
       //  lines[i] -> remainder
       buffer.remainingLines = lines.slice(i);
      }
    }
  }
});

回答by Tyler

The cleanest way to handle this is to make your line generator a readable stream- let's call it lineReader. Then the following would automatically handle the buffers and draining nicely for you:

处理这个问题的最简洁的方法是让你的行生成器成为一个可读的流——我们称之为lineReader。然后以下将自动处理缓冲区并为您很好地排出:

lineReader.pipe(fs.createWriteStream('someFile.txt'));

If you don't want to make a readable stream, you can listen to write's output for buffer-fullness and respond like this:

如果您不想创建可读流,您可以监听write's 输出以获得缓冲区满度并响应如下:

var i = 0, n = lines.length;
function write () {
  if (i === n) return;  // A callback could go here to know when it's done.
  while (stream.write(lines[i++]) && i < n);
  stream.once('drain', write);
}
write();  // Initial call.

A longer example of this situation can be found here.

可以在此处找到这种情况的更长示例。

回答by arcseldon

Several suggested answers to this question have missed the point about streams altogether.

这个问题的几个建议答案完全忽略了关于流的要点。

This module can help https://www.npmjs.org/package/JSONStream

这个模块可以帮助https://www.npmjs.org/package/JSONStream

However, lets suppose the situation as described and write the code ourselves. You are reading from a MongoDB as a stream, with ObjectMode = true by default.

但是,让我们假设所描述的情况并自己编写代码。您正在从 MongoDB 作为流读取,默认情况下 ObjectMode = true。

This will lead to issues if you try to directly stream to file - something like "Invalid non-string/buffer chunk" error.

如果您尝试直接流式传输到文件,这将导致问题 - 类似于“无效的非字符串/缓冲区块”错误。

The solution to this type of problem is very simple.

此类问题的解决方案非常简单。

Just put another Transform in between the readable and writeable to adapt the Object readable to a String writeable appropriately.

只需在可读和可写之间放置另一个转换,以将可读的对象适当地调整为可写的字符串。

Sample Code Solution:

示例代码解决方案:

var fs = require('fs'),
    writeStream = fs.createWriteStream('./out' + process.pid, {flags: 'w', encoding: 'utf-8' }),
    stream = require('stream'),
    stringifier = new stream.Transform();
stringifier._writableState.objectMode = true;
stringifier._transform = function (data, encoding, done) {
    this.push(JSON.stringify(data));
    this.push('\n');
    done();
}
rowFeedDao.getRowFeedsStream(merchantId, jobId)
.pipe(stringifier)
.pipe(writeStream).on('error', function (err) {
   // handle error condition
}

回答by maerics

[Edit]The updated Node.js writable.write(...)API docssay:

[编辑]更新的 Node.js writable.write(...)API 文档说:

[The] return value is strictly advisory. You MAY continue to write, even if it returns false. However, writes will be buffered in memory, so it is best not to do this excessively. Instead, wait for the drain event before writing more data.

[The] 返回值是严格建议的。您可以继续写入,即使它返回 false。但是,写入会缓冲在内存中,因此最好不要过度执行此操作。相反,在写入更多数据之前等待耗尽事件。

[Original]From the stream.write(...)documentation(emphasis mine):

[原文]来自stream.write(...)文档(强调我的):

Returns trueif the string has been flushed to the kernel buffer. Returns falseto indicate that the kernel buffer is full, and the data will be sent out in the future.

返回true字符串是否已刷新到内核缓冲区。返回false表示内核缓冲区已满,以后会发送数据。

I interpret this to mean that the "write" function returns trueif the given string was immediately written to the underlying OS buffer or falseif it was not yet written but will be written by the write function(e.g. was presumably buffered for you by the WriteStream) so that you do not have to call "write" again.

我将此解释为,true如果给定的字符串立即写入底层操作系统缓冲区,或者false尚未写入但将由写入函数写入(例如,可能由 WriteStream 为您缓冲),则“写入”函数返回这样您就不必再次调用“写入”。

回答by youurayy

I found streams to be a poor performing way to deal with large files - this is because you cannot set an adequate input buffer size (at least I'm not aware of a good way to do it). This is what I do:

我发现流处理大文件的性能很差 - 这是因为您无法设置足够的输入缓冲区大小(至少我不知道这样做的好方法)。这就是我所做的:

var fs = require('fs');

var i = fs.openSync('input.txt', 'r');
var o = fs.openSync('output.txt', 'w');

var buf = new Buffer(1024 * 1024), len, prev = '';

while(len = fs.readSync(i, buf, 0, buf.length)) {

    var a = (prev + buf.toString('ascii', 0, len)).split('\n');
    prev = len === buf.length ? '\n' + a.splice(a.length - 1)[0] : '';

    var out = '';
    a.forEach(function(line) {

        if(!line)
            return;

        // do something with your line here

        out += line + '\n';
    });

    var bout = new Buffer(out, 'ascii');
    fs.writeSync(o, bout, 0, bout.length);
}

fs.closeSync(o);
fs.closeSync(i);

回答by anneb

If you do not happen to have an input stream you cannot easily use pipe. None of the above worked for me, the drain event doesn't fire. Solved as follows (based on Tylers answer):

如果您碰巧没有输入流,则无法轻松使用管道。以上都不适合我,排水事件不会触发。解决方法如下(基于泰勒的回答):

var lines[]; // some very large array
var i = 0;

function write() {
    if (i < lines.length)  {
        wstream.write(lines[i]), function(err){
            if (err) {
                console.log(err);
            } else {
                i++;
                write();
            }
        });
    } else {
        wstream.end();
        console.log("done");
    }
};
write();