javascript FileReader - 分块解析长文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14438187/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 16:45:13  来源:igfitidea点击:

javascript FileReader - parsing long file in chunks

javascripthtmlparsingfilereader

提问by mnowotka

I have long file I need to parse. Because it's very long I need to do it chunk by chunk. I tried this:

我有需要解析的长文件。因为它很长,我需要一块一块地做。我试过这个:

function parseFile(file){
    var chunkSize = 2000;
    var fileSize = (file.size - 1);

    var foo = function(e){
        console.log(e.target.result);
    };

    for(var i =0; i < fileSize; i += chunkSize)
    {
        (function( fil, start ) {
            var reader = new FileReader();
            var blob = fil.slice(start, chunkSize + 1);
            reader.onload = foo;
            reader.readAsText(blob);
        })( file, i );
    }
}

After running it I see only the first chunk in the console. If I change 'console.log' to jquery append to some div I see only first chunk in that div. What about other chunks? How to make it work?

运行它后,我只看到控制台中的第一个块。如果我将 'console.log' 更改为 jquery 附加到某个 div,我只会看到该 div 中的第一个块。其他块呢?如何使它工作?

回答by alediaferia

FileReaderAPI is asynchronous so you should handle it with blockcalls. A for loopwouldn't do the trick since it wouldn't wait for each read to complete before reading the next chunk. Here's a working approach.

FileReaderAPI 是异步的,因此您应该通过block调用来处理它。Afor loop不会这样做,因为它不会在读取下一个块之前等待每个读取完成。这是一种工作方法。

function parseFile(file, callback) {
    var fileSize   = file.size;
    var chunkSize  = 64 * 1024; // bytes
    var offset     = 0;
    var self       = this; // we need a reference to the current object
    var chunkReaderBlock = null;

    var readEventHandler = function(evt) {
        if (evt.target.error == null) {
            offset += evt.target.result.length;
            callback(evt.target.result); // callback for handling read chunk
        } else {
            console.log("Read error: " + evt.target.error);
            return;
        }
        if (offset >= fileSize) {
            console.log("Done reading file");
            return;
        }

        // of to the next chunk
        chunkReaderBlock(offset, chunkSize, file);
    }

    chunkReaderBlock = function(_offset, length, _file) {
        var r = new FileReader();
        var blob = _file.slice(_offset, length + _offset);
        r.onload = readEventHandler;
        r.readAsText(blob);
    }

    // now let's start the read with the first block
    chunkReaderBlock(offset, chunkSize, file);
}

回答by Endless

You can take advantage of Response(part of fetch) to convert most things to anything else blob, text, json and also get a ReadableStream that can help you read the blob in chunks

您可以利用Responsefetch 的一部分)将大多数内容转换为其他任何东西 blob、text、json 并获得一个 ReadableStream 可以帮助您分块读取 blob

var dest = new WritableStream({
  write (str) {
    console.log(str)
  }
})

var blob = new Blob(['bloby']);

(blob.stream ? blob.stream() : new Response(blob).body)
  // Decode the binary-encoded response to string
  .pipeThrough(new TextDecoderStream())
  .pipeTo(dest)
  .then(() => {
    console.log('done')
  })

Old answer(WritableStreams pipeTo and pipeThrough was not implemented before)

旧答案(WritableStreams pipeTo 和 pipeThrough 之前没有实现)

I came up with a interesting idéa that is probably very fast since it will convert the blob to a ReadableByteStreamReader probably much easier too since you don't need to handle stuff like chunk size and offset and then doing it all recursive in a loop

我想出了一个有趣的想法,它可能非常快,因为它将 blob 转换为 ReadableByteStreamReader 也可能更容易,因为您不需要处理诸如块大小和偏移量之类的东西,然后在循环中进行所有递归

function streamBlob(blob) {
  const reader = new Response(blob).body.getReader()
  const pump = reader => reader.read()
  .then(({ value, done }) => {
    if (done) return
    // uint8array chunk (use TextDecoder to read as text)
    console.log(value)
    return pump(reader)
  })
  return pump(reader)
}

streamBlob(new Blob(['bloby'])).then(() => {
  console.log('done')
})

回答by Minko Gechev

The second argument of sliceis actually the end byte. Your code should look something like:

的第二个参数slice实际上是结束字节。您的代码应该类似于:

 function parseFile(file){
    var chunkSize = 2000;
    var fileSize = (file.size - 1);

    var foo = function(e){
        console.log(e.target.result);
    };

    for(var i =0; i < fileSize; i += chunkSize) {
        (function( fil, start ) {
            var reader = new FileReader();
            var blob = fil.slice(start, chunkSize + start);
            reader.onload = foo;
            reader.readAsText(blob);
        })(file, i);
    }
}

Or you can use this BlobReaderfor easier interface:

或者您可以使用它BlobReader来获得更简单的界面:

BlobReader(blob)
.readText(function (text) {
  console.log('The text in the blob is', text);
});

More information:

更多信息:

回答by Flavien Volken

Revamped @alediaferia answer in a class (typescript version here) and returning the result in a promise. Brave coders would even have wrapped it into an async iterator

修改了类中的@alediaferia 答案(此处为打字稿版本)并在承诺中返回结果。勇敢的程序员甚至会把它包装成一个异步迭代器......

class FileStreamer {
    constructor(file) {
        this.file = file;
        this.offset = 0;
        this.defaultChunkSize = 64 * 1024; // bytes
        this.rewind();
    }
    rewind() {
        this.offset = 0;
    }
    isEndOfFile() {
        return this.offset >= this.getFileSize();
    }
    readBlockAsText(length = this.defaultChunkSize) {
        const fileReader = new FileReader();
        const blob = this.file.slice(this.offset, this.offset + length);
        return new Promise((resolve, reject) => {
            fileReader.onloadend = (event) => {
                const target = (event.target);
                if (target.error == null) {
                    const result = target.result;
                    this.offset += result.length;
                    this.testEndOfFile();
                    resolve(result);
                }
                else {
                    reject(target.error);
                }
            };
            fileReader.readAsText(blob);
        });
    }
    testEndOfFile() {
        if (this.isEndOfFile()) {
            console.log('Done reading file');
        }
    }
    getFileSize() {
        return this.file.size;
    }
}

Example printing a whole file in the console (within an asynccontext)

在控制台中打印整个文件的示例(在异步上下文中)

const fileStreamer = new FileStreamer(aFile);
while (!fileStreamer.isEndOfFile()) {
  const data = await fileStreamer.readBlockAsText();
  console.log(data);
}

回答by Radadiya Nikunj

Parsing the large file into small chunk by using the simple method:

使用简单的方法将大文件解析为小块:

                //Parse large file in to small chunks
                var parseFile = function (file) {

                        var chunkSize = 1024 * 1024 * 16; //16MB Chunk size
                        var fileSize = file.size;
                        var currentChunk = 1;
                        var totalChunks = Math.ceil((fileSize/chunkSize), chunkSize);

                        while (currentChunk <= scope.totalChunks) {

                            var offset = (currentChunk-1) * chunkSize;
                            var currentFilePart = file.slice(offset, (offset+chunkSize));

                            console.log('Current chunk number is ', currentChunk);
                            console.log('Current chunk data', currentFilePart);

                            currentChunk++;
                        }
                };