javascript nodejs同步逐行读取大文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7545147/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
nodejs synchronization read large file line by line?
提问by nroe
I have a large file (utf8). I know fs.createReadStream
can create stream to read a large file, but not synchronized. So i try to use fs.readSync
, but read text is broken like "迈?"
.
我有一个大文件(utf8)。我知道fs.createReadStream
可以创建流来读取大文件,但不同步。所以我尝试使用fs.readSync
,但阅读文本被破坏了"迈?"
。
var fs = require('fs');
var util = require('util');
var textPath = __dirname + '/people-daily.txt';
var fd = fs.openSync(textPath, "r");
var text = fs.readSync(fd, 4, 0, "utf8");
console.log(util.inspect(text, true, null));
回答by Peace Makes Plenty
For large files, readFileSync
can be inconvenient, as it loads the whole file in memory. A different synchronous approach is to iteratively call readSync
, reading small bits of data at a time, and processing the lines as they come. The following bit of code implements this approach and synchronously processes one line at a time from the file 'test.txt':
对于大文件,readFileSync
可能不方便,因为它会将整个文件加载到内存中。另一种不同的同步方法是迭代调用readSync
,一次读取少量数据,并在行出现时对其进行处理。以下代码实现了这种方法,并从文件“test.txt”中一次同步处理一行:
var fs = require('fs');
var filename = 'test.txt'
var fd = fs.openSync(filename, 'r');
var bufferSize = 1024;
var buffer = new Buffer(bufferSize);
var leftOver = '';
var read, line, idxStart, idx;
while ((read = fs.readSync(fd, buffer, 0, bufferSize, null)) !== 0) {
leftOver += buffer.toString('utf8', 0, read);
idxStart = 0
while ((idx = leftOver.indexOf("\n", idxStart)) !== -1) {
line = leftOver.substring(idxStart, idx);
console.log("one line read: " + line);
idxStart = idx + 1;
}
leftOver = leftOver.substring(idxStart);
}
回答by Divam Gupta
use https://github.com/nacholibre/node-readlines
使用https://github.com/nacholibre/node-readlines
var lineByLine = require('n-readlines');
var liner = new lineByLine('./textFile.txt');
var line;
var lineNumber = 0;
while (line = liner.next()) {
console.log('Line ' + lineNumber + ': ' + line.toString('ascii'));
lineNumber++;
}
console.log('end of line reached');
回答by Tom
Use readFileSync:
使用readFileSync:
fs.readFileSync(filename, [encoding]) Synchronous version of fs.readFile. Returns the contents of the filename.
If encoding is specified then this function returns a string. Otherwise it returns a buffer.
fs.readFileSync(filename, [encoding]) fs.readFile 的同步版本。返回文件名的内容。
如果指定了编码,则此函数返回一个字符串。否则它返回一个缓冲区。
On a side note, since you are using node, I'd recommend using asynchronous functions.
附带说明一下,由于您使用的是节点,因此我建议您使用异步函数。
回答by srkleiman
I built a simpler version JB Kohn's answer that uses split() on the buffer. It works on the larger files I tried.
我构建了一个更简单的版本 JB Kohn 的答案,它在缓冲区上使用 split() 。它适用于我尝试过的较大文件。
/*
* Synchronously call fn(text, lineNum) on each line read from file descriptor fd.
*/
function forEachLine (fd, fn) {
var bufSize = 64 * 1024;
var buf = new Buffer(bufSize);
var leftOver = '';
var lineNum = 0;
var lines, n;
while ((n = fs.readSync(fd, buf, 0, bufSize, null)) !== 0) {
lines = buf.toString('utf8', 0 , n).split('\n');
lines[0] = leftOver+lines[0]; // add leftover string from previous read
while (lines.length > 1) { // process all but the last line
fn(lines.shift(), lineNum);
lineNum++;
}
leftOver = lines.shift(); // save last line fragment (may be '')
}
if (leftOver) { // process any remaining line
fn(leftOver, lineNum);
}
}
回答by user943702
two potential problems,
两个潜在的问题,
- 3bytes BOM at the beginning you did not skip
- first 4bytes cannot be well format to UTF8's chars( utf8 is not fixed length )
- 开头的 3bytes BOM 你没跳过
- 前 4 个字节不能很好地格式化为 UTF8 的字符(utf8 不是固定长度)