javascript FileReader - 分块解析长文件

Question

提问by mnowotka

I have long file I need to parse. Because it's very long I need to do it chunk by chunk. I tried this:

我有需要解析的长文件。因为它很长，我需要一块一块地做。我试过这个：

function parseFile(file){
    var chunkSize = 2000;
    var fileSize = (file.size - 1);

    var foo = function(e){
        console.log(e.target.result);
    };

    for(var i =0; i < fileSize; i += chunkSize)
    {
        (function( fil, start ) {
            var reader = new FileReader();
            var blob = fil.slice(start, chunkSize + 1);
            reader.onload = foo;
            reader.readAsText(blob);
        })( file, i );
    }
}

After running it I see only the first chunk in the console. If I change 'console.log' to jquery append to some div I see only first chunk in that div. What about other chunks? How to make it work?

运行它后，我只看到控制台中的第一个块。如果我将 'console.log' 更改为 jquery 附加到某个 div，我只会看到该 div 中的第一个块。其他块呢？如何使它工作？

Answer 1

回答by alediaferia

FileReaderAPI is asynchronous so you should handle it with blockcalls. A for loopwouldn't do the trick since it wouldn't wait for each read to complete before reading the next chunk. Here's a working approach.

FileReaderAPI 是异步的，因此您应该通过block调用来处理它。Afor loop不会这样做，因为它不会在读取下一个块之前等待每个读取完成。这是一种工作方法。

function parseFile(file, callback) {
    var fileSize   = file.size;
    var chunkSize  = 64 * 1024; // bytes
    var offset     = 0;
    var self       = this; // we need a reference to the current object
    var chunkReaderBlock = null;

    var readEventHandler = function(evt) {
        if (evt.target.error == null) {
            offset += evt.target.result.length;
            callback(evt.target.result); // callback for handling read chunk
        } else {
            console.log("Read error: " + evt.target.error);
            return;
        }
        if (offset >= fileSize) {
            console.log("Done reading file");
            return;
        }

        // of to the next chunk
        chunkReaderBlock(offset, chunkSize, file);
    }

    chunkReaderBlock = function(_offset, length, _file) {
        var r = new FileReader();
        var blob = _file.slice(_offset, length + _offset);
        r.onload = readEventHandler;
        r.readAsText(blob);
    }

    // now let's start the read with the first block
    chunkReaderBlock(offset, chunkSize, file);
}

Answer 2

回答by Endless

You can take advantage of Response(part of fetch) to convert most things to anything else blob, text, json and also get a ReadableStream that can help you read the blob in chunks

您可以利用Response（fetch 的一部分）将大多数内容转换为其他任何东西 blob、text、json 并获得一个 ReadableStream 可以帮助您分块读取 blob

var dest = new WritableStream({
  write (str) {
    console.log(str)
  }
})

var blob = new Blob(['bloby']);

(blob.stream ? blob.stream() : new Response(blob).body)
  // Decode the binary-encoded response to string
  .pipeThrough(new TextDecoderStream())
  .pipeTo(dest)
  .then(() => {
    console.log('done')
  })

Old answer(WritableStreams pipeTo and pipeThrough was not implemented before)

旧答案（WritableStreams pipeTo 和 pipeThrough 之前没有实现）

I came up with a interesting idéa that is probably very fast since it will convert the blob to a ReadableByteStreamReader probably much easier too since you don't need to handle stuff like chunk size and offset and then doing it all recursive in a loop

我想出了一个有趣的想法，它可能非常快，因为它将 blob 转换为 ReadableByteStreamReader 也可能更容易，因为您不需要处理诸如块大小和偏移量之类的东西，然后在循环中进行所有递归

function streamBlob(blob) {
  const reader = new Response(blob).body.getReader()
  const pump = reader => reader.read()
  .then(({ value, done }) => {
    if (done) return
    // uint8array chunk (use TextDecoder to read as text)
    console.log(value)
    return pump(reader)
  })
  return pump(reader)
}

streamBlob(new Blob(['bloby'])).then(() => {
  console.log('done')
})

Answer 3

回答by Minko Gechev

The second argument of sliceis actually the end byte. Your code should look something like:

的第二个参数slice实际上是结束字节。您的代码应该类似于：

 function parseFile(file){
    var chunkSize = 2000;
    var fileSize = (file.size - 1);

    var foo = function(e){
        console.log(e.target.result);
    };

    for(var i =0; i < fileSize; i += chunkSize) {
        (function( fil, start ) {
            var reader = new FileReader();
            var blob = fil.slice(start, chunkSize + start);
            reader.onload = foo;
            reader.readAsText(blob);
        })(file, i);
    }
}

Or you can use this BlobReaderfor easier interface:

或者您可以使用它BlobReader来获得更简单的界面：

BlobReader(blob)
.readText(function (text) {
  console.log('The text in the blob is', text);
});

More information:

更多信息：

Answer 4

回答by Flavien Volken

Revamped @alediaferia answer in a class (typescript version here) and returning the result in a promise. Brave coders would even have wrapped it into an async iterator…

修改了类中的@alediaferia 答案（此处为打字稿版本）并在承诺中返回结果。勇敢的程序员甚至会把它包装成一个异步迭代器......

class FileStreamer {
    constructor(file) {
        this.file = file;
        this.offset = 0;
        this.defaultChunkSize = 64 * 1024; // bytes
        this.rewind();
    }
    rewind() {
        this.offset = 0;
    }
    isEndOfFile() {
        return this.offset >= this.getFileSize();
    }
    readBlockAsText(length = this.defaultChunkSize) {
        const fileReader = new FileReader();
        const blob = this.file.slice(this.offset, this.offset + length);
        return new Promise((resolve, reject) => {
            fileReader.onloadend = (event) => {
                const target = (event.target);
                if (target.error == null) {
                    const result = target.result;
                    this.offset += result.length;
                    this.testEndOfFile();
                    resolve(result);
                }
                else {
                    reject(target.error);
                }
            };
            fileReader.readAsText(blob);
        });
    }
    testEndOfFile() {
        if (this.isEndOfFile()) {
            console.log('Done reading file');
        }
    }
    getFileSize() {
        return this.file.size;
    }
}

Example printing a whole file in the console (within an asynccontext)

在控制台中打印整个文件的示例（在异步上下文中）

const fileStreamer = new FileStreamer(aFile);
while (!fileStreamer.isEndOfFile()) {
  const data = await fileStreamer.readBlockAsText();
  console.log(data);
}

Answer 5

回答by Radadiya Nikunj

Parsing the large file into small chunk by using the simple method:

使用简单的方法将大文件解析为小块：

                //Parse large file in to small chunks
                var parseFile = function (file) {

                        var chunkSize = 1024 * 1024 * 16; //16MB Chunk size
                        var fileSize = file.size;
                        var currentChunk = 1;
                        var totalChunks = Math.ceil((fileSize/chunkSize), chunkSize);

                        while (currentChunk <= scope.totalChunks) {

                            var offset = (currentChunk-1) * chunkSize;
                            var currentFilePart = file.slice(offset, (offset+chunkSize));

                            console.log('Current chunk number is ', currentChunk);
                            console.log('Current chunk data', currentFilePart);

                            currentChunk++;
                        }
                };

javascript FileReader - 分块解析长文件

提问by mnowotka

回答by alediaferia

回答by Endless

回答by Minko Gechev

回答by Flavien Volken

回答by Radadiya Nikunj

相关推荐

最近更新

标签

javascript FileReader - 分块解析长文件

提问by mnowotka

回答by alediaferia

回答by Endless

回答by Minko Gechev

回答by Flavien Volken

回答by Radadiya Nikunj

相关推荐

Javascript 使用 pdf.js 打印 PDF

Javascript AngularJS 控制器和方法

Javascript 像数组一样访问 json 对象中的元素

Javascript 在继续循环之前等待回调

相关推荐

最近更新

标签