javascript 使用 Node.js 实时读取文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11225001/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading a file in real-time using Node.js
提问by Oliver Lloyd
I need to work out the best way to read data that is being written to a file, using node.js, in real time. Trouble is, Node is a fast moving ship which makes finding the best method for addressing a problem difficult.
我需要找出使用 node.js 实时读取正在写入文件的数据的最佳方法。问题是,Node 是一艘快速移动的船,这使得找到解决问题的最佳方法变得困难。
What I Want To Do
I have a java process that is doing something and then writing the results of this thing it does to a text file. It typically takes anything from 5 mins to 5 hours to run, with data being written the whole time, and can get up to some fairly hefty throughput rates (circa. 1000 lines/sec).
我想要做什么 我
有一个 Java 进程,它正在做一些事情,然后将它所做的事情的结果写入文本文件。它通常需要 5 分钟到 5 小时的时间来运行,数据一直在写入,并且可以达到相当高的吞吐率(大约 1000 行/秒)。
I would like to read this file, in real time, and then, using node aggregate the data and write it to a socket where it can be graphed on the client.
我想实时读取此文件,然后使用节点聚合数据并将其写入套接字,以便在客户端上绘制图形。
The client, graphs, sockets and aggregation logic are all done but I am confused about the best approach for reading the file.
客户端、图形、套接字和聚合逻辑都已完成,但我对读取文件的最佳方法感到困惑。
What I Have Tried (or at least played with)FIFO
- I can tell my Java process to write to a fifo and read this using node, this is in fact how we have this currently implemted using Perl, but because everything else is running in node it makes sense to port the code over.
我尝试过的(或至少玩过的)FIFO
- 我可以告诉我的 Java 进程写入一个 fifo 并使用 node 读取它,这实际上是我们目前使用 Perl 实现的方式,但是因为其他所有东西都在 node 中运行移植代码是有意义的。
Unix Sockets
- As above.
Unix Sockets
- 如上。
fs.watchFile
- will this work for what we need?
fs.watchFile
- 这能满足我们的需求吗?
fs.createReadStream
- is this better than watchFile?
fs.createReadStream
- 这比 watchFile 好吗?
fs
& tail -f
- seems like a hack.
fs
& tail -f
- 似乎是一个黑客。
What, actually, is my Question
I am tending towards using Unix Sockets, this seems the fastest option. But does node have better built-in features for reading files from the fs in real time?
实际上,我
倾向于使用 Unix 套接字的问题是什么,这似乎是最快的选择。但是 node 有更好的内置功能可以实时从 fs 读取文件吗?
采纳答案by hasanyasin
If you want to keep the file as a persistent store of your data to prevent a loss of stream in case of a system crash or one of the members in your network of running processes dies, you can still continue on writing to a file and reading from it.
如果您想将该文件作为数据的持久存储,以防止在系统崩溃或正在运行的进程网络中的一个成员死亡时流丢失,您仍然可以继续写入文件和读取从中。
If you do not need this file as a persistent storage of produced results from your Java process, then going with a Unix socket is much better for both the ease and also the performance.
如果你不需要这个文件作为你的 Java 进程产生的结果的持久存储,那么使用 Unix 套接字在易用性和性能方面都要好得多。
fs.watchFile()
is not what you need because it works on file stats as filesystem reports it and since you want to read the file as it is already being written, this is not what you want.
fs.watchFile()
不是您需要的,因为它在文件系统报告时处理文件统计信息,并且由于您想读取已经写入的文件,这不是您想要的。
SHORT UPDATE:I am very sorry to realize that although I had accused fs.watchFile()
for using file stats in previous paragraph, I had done the very same thing myself in my example code below! Although I had already warned readers to "take care!" because I had written it in just a few minutes without even testing well; still, it can be done better by using fs.watch()
instead of watchFile
or fstatSync
if underlying system supports it.
简短更新:我很遗憾地意识到,虽然我fs.watchFile()
在上一段指责使用文件统计信息,但我在下面的示例代码中自己做了同样的事情!虽然我已经警告读者“小心!” 因为我只用了几分钟就写完了,甚至没有经过很好的测试;尽管如此,通过使用fs.watch()
代替watchFile
或fstatSync
底层系统支持它可以做得更好。
For reading/writing from a file, I have just written below for fun in my break:
为了从文件中读取/写入,我刚刚在下面写了一段文字,是为了在休息时玩得开心:
test-fs-writer.js: [You will not need this since you write file in your Java process]
test-fs-writer.js: [你不需要这个,因为你在你的 Java 进程中编写文件]
var fs = require('fs'),
lineno=0;
var stream = fs.createWriteStream('test-read-write.txt', {flags:'a'});
stream.on('open', function() {
console.log('Stream opened, will start writing in 2 secs');
setInterval(function() { stream.write((++lineno)+' oi!\n'); }, 2000);
});
test-fs-reader.js: [Take care, this is just demonstration, check err objects!]
test-fs-reader.js: [注意,这只是演示,检查错误对象!]
var fs = require('fs'),
bite_size = 256,
readbytes = 0,
file;
fs.open('test-read-write.txt', 'r', function(err, fd) { file = fd; readsome(); });
function readsome() {
var stats = fs.fstatSync(file); // yes sometimes async does not make sense!
if(stats.size<readbytes+1) {
console.log('Hehe I am much faster than your writer..! I will sleep for a while, I deserve it!');
setTimeout(readsome, 3000);
}
else {
fs.read(file, new Buffer(bite_size), 0, bite_size, readbytes, processsome);
}
}
function processsome(err, bytecount, buff) {
console.log('Read', bytecount, 'and will process it now.');
// Here we will process our incoming data:
// Do whatever you need. Just be careful about not using beyond the bytecount in buff.
console.log(buff.toString('utf-8', 0, bytecount));
// So we continue reading from where we left:
readbytes+=bytecount;
process.nextTick(readsome);
}
You can safely avoid using nextTick
and call readsome()
directly instead. Since we are still working sync here, it is not necessary in any sense. I just like it. :p
您可以安全地避免使用nextTick
并readsome()
直接调用。由于我们仍在此处进行同步,因此在任何意义上都没有必要。我就是喜欢。:p
EDIT by Oliver Lloyd
奥利弗·劳埃德编辑
Taking the example above but extending it to read CSV data gives:
以上面的例子为例,但将其扩展为读取 CSV 数据给出:
var lastLineFeed,
lineArray;
function processsome(err, bytecount, buff) {
lastLineFeed = buff.toString('utf-8', 0, bytecount).lastIndexOf('\n');
if(lastLineFeed > -1){
// Split the buffer by line
lineArray = buff.toString('utf-8', 0, bytecount).slice(0,lastLineFeed).split('\n');
// Then split each line by comma
for(i=0;i<lineArray.length;i++){
// Add read rows to an array for use elsewhere
valueArray.push(lineArray[i].split(','));
}
// Set a new position to read from
readbytes+=lastLineFeed+1;
} else {
// No complete lines were read
readbytes+=bytecount;
}
process.nextTick(readFile);
}
回答by vik
Why do you think tail -f
is a hack?
为什么你认为tail -f
是黑客?
While figuring out I found a good example I would do something similar.
Real time online activity monitor example with node.js and WebSocket:
http://blog.new-bamboo.co.uk/2009/12/7/real-time-online-activity-monitor-example-with-node-js-and-websocket
在弄清楚的同时,我找到了一个很好的例子,我会做类似的事情。使用 node.js 和 WebSocket 的实时在线活动监视器示例:http:
//blog.new-bamboo.co.uk/2009/12/7/real-time-online-activity-monitor-example-with-node-js -and-websocket
Just to make this answer complete, I wrote you an example code which would run under 0.8.0 - (the http server is a hack maybe).
为了使这个答案完整,我给您写了一个示例代码,它可以在 0.8.0 下运行 - (http 服务器可能是一个黑客)。
A child process is spawned running with tail, and since a child process is an EventEmitter with three streams (we use stdout in our case) you can just add the a listener with on
一个子进程是用尾部生成的,因为一个子进程是一个带有三个流的 EventEmitter(在我们的例子中我们使用 stdout),你可以添加一个监听器 on
filename: tailServer.js
文件名:tailServer.js
usage: node tailServer /var/log/filename.log
用法: node tailServer /var/log/filename.log
var http = require("http");
var filename = process.argv[2];
if (!filename)
return console.log("Usage: node tailServer filename");
var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);
http.createServer(function (request, response) {
console.log('request starting...');
response.writeHead(200, {'Content-Type': 'text/plain' });
tail.stdout.on('data', function (data) {
response.write('' + data);
});
}).listen(8088);
console.log('Server running at http://127.0.0.1:8088/');
回答by dominic
this module is an implementation of the principle @hasanyasin suggests:
该模块是@hasanyasin 建议的原则的实现:
回答by user2426679
I took the answer from @hasanyasin and wrapped it up into a modular promise. The basic idea is that you pass a file and a handler function that does something with the stringified-buffer that is read from the file. If the handler function returns true, then the file will stop being read. You can also set a timeout that will kill reading if the handler doesn't return true fast enough.
我从@hasanyasin 那里得到了答案,并将其包装成一个模块化的承诺。基本思想是传递一个文件和一个处理函数,该函数对从文件中读取的字符串化缓冲区执行某些操作。如果处理函数返回 true,则文件将停止读取。您还可以设置一个超时,如果处理程序没有足够快地返回 true,它将终止读取。
The promiser will return true if the resolve() was called due to timeout, otherwise it will return false.
如果由于超时而调用了 resolve(),promise 将返回 true,否则将返回 false。
See the bottom for usage example.
使用示例见底部。
// https://stackoverflow.com/a/11233045
var fs = require('fs');
var Promise = require('promise');
class liveReaderPromiseMe {
constructor(file, buffStringHandler, opts) {
/*
var opts = {
starting_position: 0,
byte_size: 256,
check_for_bytes_every_ms: 3000,
no_handler_resolution_timeout_ms: null
};
*/
if (file == null) {
throw new Error("file arg must be present");
} else {
this.file = file;
}
if (buffStringHandler == null) {
throw new Error("buffStringHandler arg must be present");
} else {
this.buffStringHandler = buffStringHandler;
}
if (opts == null) {
opts = {};
}
if (opts.starting_position == null) {
this.current_position = 0;
} else {
this.current_position = opts.starting_position;
}
if (opts.byte_size == null) {
this.byte_size = 256;
} else {
this.byte_size = opts.byte_size;
}
if (opts.check_for_bytes_every_ms == null) {
this.check_for_bytes_every_ms = 3000;
} else {
this.check_for_bytes_every_ms = opts.check_for_bytes_every_ms;
}
if (opts.no_handler_resolution_timeout_ms == null) {
this.no_handler_resolution_timeout_ms = null;
} else {
this.no_handler_resolution_timeout_ms = opts.no_handler_resolution_timeout_ms;
}
}
startHandlerTimeout() {
if (this.no_handler_resolution_timeout_ms && (this._handlerTimer == null)) {
var that = this;
this._handlerTimer = setTimeout(
function() {
that._is_handler_timed_out = true;
},
this.no_handler_resolution_timeout_ms
);
}
}
clearHandlerTimeout() {
if (this._handlerTimer != null) {
clearTimeout(this._handlerTimer);
this._handlerTimer = null;
}
this._is_handler_timed_out = false;
}
isHandlerTimedOut() {
return !!this._is_handler_timed_out;
}
fsReadCallback(err, bytecount, buff) {
try {
if (err) {
throw err;
} else {
this.current_position += bytecount;
var buff_str = buff.toString('utf-8', 0, bytecount);
var that = this;
Promise.resolve().then(function() {
return that.buffStringHandler(buff_str);
}).then(function(is_handler_resolved) {
if (is_handler_resolved) {
that.resolve(false);
} else {
process.nextTick(that.doReading.bind(that));
}
}).catch(function(err) {
that.reject(err);
});
}
} catch(err) {
this.reject(err);
}
}
fsRead(bytecount) {
fs.read(
this.file,
new Buffer(bytecount),
0,
bytecount,
this.current_position,
this.fsReadCallback.bind(this)
);
}
doReading() {
if (this.isHandlerTimedOut()) {
return this.resolve(true);
}
var max_next_bytes = fs.fstatSync(this.file).size - this.current_position;
if (max_next_bytes) {
this.fsRead( (this.byte_size > max_next_bytes) ? max_next_bytes : this.byte_size );
} else {
setTimeout(this.doReading.bind(this), this.check_for_bytes_every_ms);
}
}
promiser() {
var that = this;
return new Promise(function(resolve, reject) {
that.resolve = resolve;
that.reject = reject;
that.doReading();
that.startHandlerTimeout();
}).then(function(was_resolved_by_timeout) {
that.clearHandlerTimeout();
return was_resolved_by_timeout;
});
}
}
module.exports = function(file, buffStringHandler, opts) {
try {
var live_reader = new liveReaderPromiseMe(file, buffStringHandler, opts);
return live_reader.promiser();
} catch(err) {
return Promise.reject(err);
}
};
Then use the above code like this:
然后像这样使用上面的代码:
var fs = require('fs');
var path = require('path');
var Promise = require('promise');
var liveReadAppendingFilePromiser = require('./path/to/liveReadAppendingFilePromiser');
var ending_str = '_THIS_IS_THE_END_';
var test_path = path.join('E:/tmp/test.txt');
var s_list = [];
var buffStringHandler = function(s) {
s_list.push(s);
var tmp = s_list.join('');
if (-1 !== tmp.indexOf(ending_str)) {
// if this return never occurs, then the file will be read until no_handler_resolution_timeout_ms
// by default, no_handler_resolution_timeout_ms is null, so read will continue forever until this function returns something that evaluates to true
return true;
// you can also return a promise:
// return Promise.resolve().then(function() { return true; } );
}
};
var appender = fs.openSync(test_path, 'a');
try {
var reader = fs.openSync(test_path, 'r');
try {
var options = {
starting_position: 0,
byte_size: 256,
check_for_bytes_every_ms: 3000,
no_handler_resolution_timeout_ms: 10000,
};
liveReadAppendingFilePromiser(reader, buffStringHandler, options)
.then(function(did_reader_time_out) {
console.log('reader timed out: ', did_reader_time_out);
console.log(s_list.join(''));
}).catch(function(err) {
console.error('bad stuff: ', err);
}).then(function() {
fs.closeSync(appender);
fs.closeSync(reader);
});
fs.write(appender, '\ncheck it out, I am a string');
fs.write(appender, '\nwho killed kenny');
//fs.write(appender, ending_str);
} catch(err) {
fs.closeSync(reader);
console.log('err1');
throw err;
}
} catch(err) {
fs.closeSync(appender);
console.log('err2');
throw err;
}