Javascript Node.js：计算文件中的行数

Question

提问by hexacyanide

I have large text files, which range between 30MBand 10GB. How can I count the number of lines in a file using Node.js?

我有大文本文件，范围介于30MB和10GB. 如何使用计算文件中的行数Node.js？

I have these limitations:

我有这些限制：

The entire file does not need to be written to memory
A child process is not required to perform the task

整个文件不需要写入内存
执行任务不需要子进程

Answer 1

回答by Andrey Sidorov

solution without using wc:

不使用 wc 的解决方案：

var i;
var count = 0;
require('fs').createReadStream(process.argv[2])
  .on('data', function(chunk) {
    for (i=0; i < chunk.length; ++i)
      if (chunk[i] == 10) count++;
  })
  .on('end', function() {
    console.log(count);
  });

it's slower, but not that much you might expect - 0.6s for 140M+ file including node.js loading & startup time

它更慢，但没有你期望的那么多 - 140M+ 文件需要 0.6 秒，包括 node.js 加载和启动时间

>time node countlines.js video.mp4 
619643

real    0m0.614s
user    0m0.489s
sys 0m0.132s

>time wc -l video.mp4 
619643 video.mp4
real    0m0.133s
user    0m0.108s
sys 0m0.024s

>wc -c video.mp4
144681406  video.mp4

Answer 2

回答by Menztrual

You could do this as the comments suggest using wc

您可以按照评论建议使用 wc

var exec = require('child_process').exec;

exec('wc /path/to/file', function (error, results) {
    console.log(results);
});

Answer 3

回答by Emil Vikstr?m

We can use indexOfto let the VM find the newlines:

我们可以使用indexOf让 VM 找到换行符：

function countFileLines(filePath){
  return new Promise((resolve, reject) => {
  let lineCount = 0;
  fs.createReadStream(filePath)
    .on("data", (buffer) => {
      let idx = -1;
      lineCount--; // Because the loop will run once for idx=-1
      do {
        idx = buffer.indexOf(10, idx+1);
        lineCount++;
      } while (idx !== -1);
    }).on("end", () => {
      resolve(lineCount);
    }).on("error", reject);
  });
};

What this solution does is that it finds the position of the first newline using .indexOf. It increments lineCount, then it finds the next position. The second parameter to .indexOftells where to start looking for newlines. This way we are jumping over large chunks of the buffer. The while loop will run once for every newline, plus one.

该解决方案的作用是使用.indexOf. 它递增lineCount，然后找到下一个位置。第二个参数.indexOf告诉从哪里开始寻找换行符。这样我们就跳过了大块的缓冲区。while 循环将为每个换行符运行一次，加上一个。

We are letting the Node runtime do the searching for us which is implemented on a lower level and should be faster.

我们让 Node 运行时为我们进行搜索，这在较低级别上实现并且应该更快。

On my system this is about twice as fast as running a forloop over the buffer length on a large file (111 MB).

在我的系统上，这大约是for在大文件 (111 MB) 上的缓冲区长度上运行循环的速度的两倍。

Answer 4

回答by undoZen

since iojs 1.5.0 there is Buffer#indexOf()method, using it to compare to Andrey Sidorov' answer:

由于 iojs 1.5.0 有Buffer#indexOf()方法，用它来比较 Andrey Sidorov 的回答：

ubuntu@server:~$ wc logs
  7342500  27548750 427155000 logs
ubuntu@server:~$ time wc -l logs 
7342500 logs

real    0m0.180s
user    0m0.088s
sys 0m0.084s
ubuntu@server:~$ nvm use node
Now using node v0.12.1
ubuntu@server:~$ time node countlines.js logs 
7342500

real    0m2.559s
user    0m2.200s
sys 0m0.340s
ubuntu@server:~$ nvm use iojs
Now using node iojs-v1.6.2
ubuntu@server:~$ time iojs countlines2.js logs 
7342500

real    0m1.363s
user    0m0.920s
sys 0m0.424s
ubuntu@server:~$ cat countlines.js 
var i;
var count = 0;
require('fs').createReadStream(process.argv[2])
  .on('data', function(chunk) {
    for (i=0; i < chunk.length; ++i)
      if (chunk[i] == 10) count++;
  })
  .on('end', function() {
    console.log(count);
  });
ubuntu@server:~$ cat countlines2.js 
var i;
var count = 0;
require('fs').createReadStream(process.argv[2])
  .on('data', function(chunk) {
    var index = -1;
    while((index = chunk.indexOf(10, index + 1)) > -1) count++
  })
  .on('end', function() {
    console.log(count);
  });
ubuntu@server:~$

Answer 5

回答by Alan Viars

Here is another way without so much nesting.

这是另一种没有这么多嵌套的方法。

var fs = require('fs');
filePath = process.argv[2];
fileBuffer =  fs.readFileSync(filePath);
to_string = fileBuffer.toString();
split_lines = to_string.split("\n");
console.log(split_lines.length-1);

Answer 6

回答by Jason Kim

If you use Node 8 and above, you can use this async/await pattern

如果您使用 Node 8 及更高版本，则可以使用此 async/await 模式

const util = require('util');
const exec = util.promisify(require('child_process').exec);

async function fileLineCount({ fileLocation }) {
  const { stdout } = await exec(`cat ${fileLocation} | wc -l`);
  return parseInt(stdout);
};

// Usage

async someFunction() {
  const lineCount = await fileLineCount({ fileLocation: 'some/file.json' });
}

Answer 7

回答by Jeff Kilbride

You can also use indexOf():

您还可以使用 indexOf()：

var index = -1;
var count = 0;
while ((index = chunk.indexOf(10, index + 1)) > -1) count++;

Answer 8

回答by ruchi gupta

var fs=require('fs');
filename=process.argv[2];
var data=fs.readFileSync(filename);
var res=data.toString().split('\n').length;
console.log(res-1);`

Answer 9

回答by Dom Vinyard

There is an npm module called count-lines-in-file. I've been using it for smallish (<1000 lines) files and it's worked great so far.

有一个名为count-lines-in-file的 npm 模块。我一直将它用于较小的（<1000 行）文件，到目前为止效果很好。

Answer 10

回答by David Dombrowsky

Best solution I've found is using promises, async, and await. This is also an example of how await for the fulfillment of a promise:

我发现的最佳解决方案是使用承诺、异步和等待。这也是如何等待实现承诺的示例：

#!/usr/bin/env node
const fs = require('fs');
const readline = require('readline');
function main() {
    function doRead() {
        return new Promise(resolve => {
            var inf = readline.createInterface({
                input: fs.createReadStream('async.js'),
                crlfDelay: Infinity
            });
            var count = 0;
            inf.on('line', (line) => {
                console.log(count + ' ' + line);
                count += 1;
            });
            inf.on('close', () => resolve(count));
        });
    }
    async function showRead() {
        var x = await doRead();
        console.log('line count: ' + x);
    }
    showRead();
}
main();

Javascript Node.js：计算文件中的行数

提问by hexacyanide

回答by Andrey Sidorov

回答by Menztrual

回答by Emil Vikstr?m

回答by undoZen

回答by Alan Viars

回答by Jason Kim

回答by Jeff Kilbride

回答by ruchi gupta

回答by Dom Vinyard

回答by David Dombrowsky

相关推荐

最近更新

标签

Javascript Node.js：计算文件中的行数

提问by hexacyanide

回答by Andrey Sidorov

回答by Menztrual

回答by Emil Vikstr?m

回答by undoZen

回答by Alan Viars

回答by Jason Kim

回答by Jeff Kilbride

回答by ruchi gupta

回答by Dom Vinyard

回答by David Dombrowsky

相关推荐

Javascript 清爽猫头鹰旋转木马 2

Javascript 将值绑定到 Angular JS 中的输入

Javascript 如何在 Immutable.js 中设置深度嵌套的值？

Javascript 为什么javascript将0视为空字符串？

相关推荐

最近更新

标签