javascript nodejs同步逐行读取大文件？

Question

提问by nroe

I have a large file (utf8). I know fs.createReadStreamcan create stream to read a large file, but not synchronized. So i try to use fs.readSync, but read text is broken like "迈?".

我有一个大文件（utf8）。我知道fs.createReadStream可以创建流来读取大文件，但不同步。所以我尝试使用fs.readSync，但阅读文本被破坏了"迈?"。

var fs = require('fs');
var util = require('util');
var textPath = __dirname + '/people-daily.txt';   
var fd = fs.openSync(textPath, "r");
var text = fs.readSync(fd, 4, 0, "utf8");
console.log(util.inspect(text, true, null));

Answer 1

回答by Peace Makes Plenty

For large files, readFileSynccan be inconvenient, as it loads the whole file in memory. A different synchronous approach is to iteratively call readSync, reading small bits of data at a time, and processing the lines as they come. The following bit of code implements this approach and synchronously processes one line at a time from the file 'test.txt':

对于大文件，readFileSync可能不方便，因为它会将整个文件加载到内存中。另一种不同的同步方法是迭代调用readSync，一次读取少量数据，并在行出现时对其进行处理。以下代码实现了这种方法，并从文件“test.txt”中一次同步处理一行：

var fs = require('fs');
var filename = 'test.txt'

var fd = fs.openSync(filename, 'r');
var bufferSize = 1024;
var buffer = new Buffer(bufferSize);

var leftOver = '';
var read, line, idxStart, idx;
while ((read = fs.readSync(fd, buffer, 0, bufferSize, null)) !== 0) {
  leftOver += buffer.toString('utf8', 0, read);
  idxStart = 0
  while ((idx = leftOver.indexOf("\n", idxStart)) !== -1) {
    line = leftOver.substring(idxStart, idx);
    console.log("one line read: " + line);
    idxStart = idx + 1;
  }
  leftOver = leftOver.substring(idxStart);
}

Answer 2

回答by Divam Gupta

use https://github.com/nacholibre/node-readlines

使用https://github.com/nacholibre/node-readlines

var lineByLine = require('n-readlines');
var liner = new lineByLine('./textFile.txt');

var line;
var lineNumber = 0;
while (line = liner.next()) {
    console.log('Line ' + lineNumber + ': ' + line.toString('ascii'));
    lineNumber++;
}

console.log('end of line reached');

Answer 3

回答by Tom

Use readFileSync:

使用readFileSync：

fs.readFileSync(filename, [encoding]) Synchronous version of fs.readFile. Returns the contents of the filename.
If encoding is specified then this function returns a string. Otherwise it returns a buffer.

fs.readFileSync(filename, [encoding]) fs.readFile 的同步版本。返回文件名的内容。
如果指定了编码，则此函数返回一个字符串。否则它返回一个缓冲区。

On a side note, since you are using node, I'd recommend using asynchronous functions.

附带说明一下，由于您使用的是节点，因此我建议您使用异步函数。

Answer 4

回答by srkleiman

I built a simpler version JB Kohn's answer that uses split() on the buffer. It works on the larger files I tried.

我构建了一个更简单的版本 JB Kohn 的答案，它在缓冲区上使用 split() 。它适用于我尝试过的较大文件。

/*
 * Synchronously call fn(text, lineNum) on each line read from file descriptor fd.
 */
function forEachLine (fd, fn) {
    var bufSize = 64 * 1024;
    var buf = new Buffer(bufSize);
    var leftOver = '';
    var lineNum = 0;
    var lines, n;

    while ((n = fs.readSync(fd, buf, 0, bufSize, null)) !== 0) {
        lines = buf.toString('utf8', 0 , n).split('\n');
        lines[0] = leftOver+lines[0];       // add leftover string from previous read
        while (lines.length > 1) {          // process all but the last line
            fn(lines.shift(), lineNum);
            lineNum++;
        }
        leftOver = lines.shift();           // save last line fragment (may be '')
    }
    if (leftOver) {                         // process any remaining line
        fn(leftOver, lineNum);
    }
}

Answer 5

回答by user943702

two potential problems,

两个潜在的问题，

3bytes BOM at the beginning you did not skip
first 4bytes cannot be well format to UTF8's chars( utf8 is not fixed length )

开头的 3bytes BOM 你没跳过
前 4 个字节不能很好地格式化为 UTF8 的字符（utf8 不是固定长度）

javascript nodejs同步逐行读取大文件？

提问by nroe

回答by Peace Makes Plenty

回答by Divam Gupta

回答by Tom

回答by srkleiman

回答by user943702

相关推荐

最近更新

标签

javascript nodejs同步逐行读取大文件？

提问by nroe

回答by Peace Makes Plenty

回答by Divam Gupta

回答by Tom

回答by srkleiman

回答by user943702

相关推荐

使用 JavaScript 选择 SVG 和路径元素

javascript xmlhttprequest 状态 302 的问题

javascript 如何使用 underscore.js reduce 方法？

javascript jQuery：如何计算所有匹配元素的最大属性值？

相关推荐

最近更新

标签