Javascript Node.JS 中的 createReadStream

Question

提问by Chev

So I used fs.readFile() and it gives me

所以我使用了 fs.readFile() 它给了我

"FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory"

“致命错误：CALL_AND_RETRY_LAST 分配失败 - 进程内存不足”

since fs.readFile() loads the whole file into memory before calling the callback, should I use fs.createReadStream() instead?

由于 fs.readFile() 在调用回调之前将整个文件加载到内存中，我应该使用 fs.createReadStream() 吗？

That's what I was doing previously with readFile:

这就是我之前使用 readFile 所做的：

fs.readFile('myfile.json', function (err1, data) {
    if (err1) {
        console.error(err1);
    } else {
        var myData = JSON.parse(data);
        //Do some operation on myData here
    }
}

Sorry, I'm kind of new to streaming; is the following the right way to do the same thing but with streaming?

抱歉，我对流媒体有点陌生；以下是做同样事情但使用流媒体的正确方法吗？

var readStream = fs.createReadStream('myfile.json');

readStream.on('end', function () {  
    readStream.close();
    var myData = JSON.parse(readStream);
    //Do some operation on myData here
});

Thanks

谢谢

Answer 1

回答by Chev

If the file is enormous then yes, streaming will be how you want to deal with it. However, what you're doing in your second example is letting the stream buffer all the file data into memory and then handling it on end. It's essentially no different than readFilethat way.

如果文件很大，那么是的，流式传输将是您想要的处理方式。但是，您在第二个示例中所做的是让流将所有文件数据缓冲到内存中，然后在end. 本质上和readFile那种方式没有什么不同。

You'll want to check out JSONStream. What streaming means is that you want to deal with the data as it flows by. In your case you obviously haveto do this because you cannot buffer the entire file into memory all at once. With that in mind, hopefully code like this makes sense:

您需要查看JSONStream。流式传输意味着您希望在数据流过时对其进行处理。在您的情况下，您显然必须这样做，因为您无法一次将整个文件缓冲到内存中。考虑到这一点，希望这样的代码是有意义的：

JSONStream.parse('rows.*.doc')

Notice that it has a kind of query pattern. That's because you will not have the entire JSON object/array from the file to work with all at once, so you have to think more in terms of how you want JSONStream to deal with the data as it finds it.

请注意，它具有一种查询模式。那是因为您不会立即处理文件中的整个 JSON 对象/数组，因此您必须更多地考虑您希望 JSONStream 如何处理找到的数据。

You can use JSONStream to essentially query for the JSON data that you are interested in. This way you're never buffering the whole file into memory. It does have the downside that if you do need all the data, then you'll have to stream the file multiple times, using JSONStream to pull out only the data you need right at that moment, but in your case you don't have much choice.

您可以使用 JSONStream 来查询您感兴趣的 JSON 数据。这样您就不会将整个文件缓冲到内存中。它确实有一个缺点，如果您确实需要所有数据，那么您必须多次流式传输文件，使用 JSONStream 仅提取您当时需要的数据，但在您的情况下，您没有很多选择。

You could also use JSONStream to parse out data in order and do something like dump it into a database.

您还可以使用 JSONStream 按顺序解析数据并执行诸如将其转储到数据库中的操作。

JSONStream.parseis similar to JSON.parsebut instead of returning a whole object it returns a stream. When the parse stream gets enough data to form a whole object matching your query, it will emit a dataevent with the data being the document that matches your query. Once you've configured your data handler you can pipe your read stream into the parse stream and watch the magic happen.

JSONStream.parse类似于JSON.parse但不是返回整个对象，而是返回一个流。当解析流获得足够的数据以形成与您的查询匹配的整个对象时，它将发出一个data事件，其中数据是与您的查询匹配的文档。一旦您配置了数据处理程序，您就可以将读取流通过管道传输到解析流中并观察神奇的发生。

Example:

例子：

var JSONStream = require('JSONStream');
var readStream = fs.createReadStream('myfile.json');
var parseStream = JSONStream.parse('rows.*.doc');
parseStream.on('data', function (doc) {
  db.insert(doc); // pseudo-code for inserting doc into a pretend database.
});
readStream.pipe(parseStream);

That's the verbose way to help you understand what's happening. Here is a more succinct way:

这是帮助您了解正在发生的事情的详细方式。这是一个更简洁的方法：

var JSONStream = require('JSONStream');
fs.createReadStream('myfile.json')
  .pipe(JSONStream.parse('rows.*.doc'))
  .on('data', function (doc) {
    db.insert(doc);
  });

Edit:

编辑：

For further clarity about what's going on, try to think about it like this. Let's say you have a giant lake and you want to treat the water to purify it and move the water to a new reservtheitroad. If you had a giant magical helicopter with a huge bucket then you could fly over the lake, put the lake in the bucket, add treatment chemicals to it, then fly it to its destination.

为了更清楚地了解正在发生的事情，请尝试这样思考。假设您有一个巨大的湖泊，您想对水进行处理以净化它并将水转移到一个新的水库。如果你有一架巨大的魔法直升机和一个巨大的水桶，那么你可以飞过湖面，把湖水放进水桶里，往里面加处理化学品，然后飞到目的地。

The problem of course being that there is no such helicopter that can deal with that much weight or volume. It's simply impossible, but that doesn't mean we can't accomplish our goal a different way. So instead you build a series of rivers (streams) between the lake and the new reservtheitroad. You then set up cleansing stations in these rivers that purify any water that passes through it. These stations could operate in a variety of ways. Maybe the treatment can be done so fast that you can let the river flow freely and the purification will just happen as the water travels down the stream at maximum speed.

问题当然是没有这样的直升机可以处理这么大的重量或体积。这是不可能的，但这并不意味着我们不能以不同的方式实现我们的目标。因此，您在湖泊和新水库之间建造了一系列河流（溪流）。然后，您在这些河流中设立净化站，净化任何流经它的水。这些站可以以多种方式运行。也许治疗可以做得如此之快，以至于您可以让河流自由流动，而当水以最大速度顺流而下时，净化就会发生。

It's also possible that it takes some time for the water to be treated, or that the station needs a certain amount of water before it can effectively treat it. So you design your rivers to have gates and you control the flow of the water from the lake into your rivers, letting the stations buffer just the water they need until they've performed their job and released the purified water downstream and on to its final destination.

也有可能是水需要一些时间来处理，或者站需要一定量的水才能有效处理。所以你设计你的河流有闸门，你控制水从湖到你的河流的流量，让这些站只缓冲他们需要的水，直到他们完成他们的工作并将净化的水释放到下游直到最后目的地。

That's almost exactly what you want to do with your data. The parse stream is your cleansing station and it buffers data until it has enough to form a whole document that matches your query, then it pushes just that data downstream (and emits the dataevent).

这几乎正是您想要对数据执行的操作。解析流是您的清理站，它会缓冲数据，直到它足以形成与您的查询匹配的整个文档，然后仅将数据推送到下游（并发出data事件）。

Node streams are nice because most of the time you don't have to deal with opening and closing the gates. Node streams are smart enough to control backflow when the stream buffers a certain amount of data. It's as if the cleansing station and the gates on the lake are talking to each other to work out the perfect flow rate.

节点流很好，因为大多数时候您不必处理打开和关闭大门。节点流足够智能，可以在流缓冲一定数量的数据时控制回流。就好像净化站和湖上的闸门正在互相交谈以计算出完美的流速。

If you had a streaming database driver then you'd theoretically be able to create some kind of insert stream and then do parseStream.pipe(insertStream)instead of handling the dataevent manually :D. Here's an example of creating a filtered version of your JSON file, in another file.

如果您有一个流式数据库驱动程序，那么理论上您可以创建某种插入流，然后执行parseStream.pipe(insertStream)而不是data手动处理事件：D。这是在另一个文件中创建 JSON 文件的过滤版本的示例。

fs.createReadStream('myfile.json')
  .pipe(JSONStream.parse('rows.*.doc'))
  .pipe(JSONStream.stringify())
  .pipe(fs.createWriteStream('filtered-myfile.json'));

Javascript Node.JS 中的 createReadStream

提问by Chev

回答by Chev

Edit:

编辑：

相关推荐

最近更新

标签

Javascript Node.JS 中的 createReadStream

提问by Chev

回答by Chev

Edit:

编辑：

相关推荐

如何同步确定 JavaScript Promise 的状态？

Javascript 如何使 jQuery 不舍入 .width() 返回的值？

Javascript 如何存储配置文件并使用 React 读取它

Javascript 将csv文件转换为json对象数据表

相关推荐

最近更新

标签