Javascript 大文件上的文件阅读器 API
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25810051/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
filereader api on big files
提问by ODelibalta
My file reader api code has been working good so far until one day I got a 280MB txt file from one of my client. Page just crashes straight up in Chrome and in Firefox nothing happens.
到目前为止,我的文件阅读器 api 代码一直运行良好,直到有一天我从我的一位客户那里得到了一个 280MB 的 txt 文件。页面在 Chrome 中直接崩溃,在 Firefox 中没有任何反应。
// create new reader object
var fileReader = new FileReader();
// read the file as text
fileReader.readAsText( $files[i] );
fileReader.onload = function(e)
{ // read all the information about the file
// do sanity checks here etc...
$timeout( function()
{
// var fileContent = e.target.result;
// get the first line
var firstLine = e.target.result.slice(0, e.target.result.indexOf("\n") ); }}
What I am trying to do above is that get the first line break so that I can get the column length of the file. Should I not read it as text ? How can I get the column length of the file without breaking the page on big files?
我在上面尝试做的是获取第一个换行符,以便我可以获得文件的列长度。我不应该把它作为文本阅读吗?如何在不破坏大文件页面的情况下获取文件的列长度?
回答by Rob W
Your application is failing for big files because you're reading the full file into memory before processing it. This inefficiency can be solved by streaming the file (reading chunks of a small size), so you only need to hold a part of the file in memory.
您的应用程序无法处理大文件,因为您在处理之前将整个文件读入内存。这种低效率可以通过流式传输文件(读取小尺寸的块)来解决,因此您只需将文件的一部分保存在内存中。
A Fileobjects is also an instance of a Blob, which offers the .slicemethod to create a smaller view of the file.
AFile对象也是 a 的一个实例Blob,它提供了.slice创建文件较小视图的方法。
Here is an example that assumes that the input is ASCII (demo: http://jsfiddle.net/mw99v8d4/).
这是一个假设输入是 ASCII 的示例(演示:http: //jsfiddle.net/mw99v8d4/)。
function findColumnLength(file, callback) {
// 1 KB at a time, because we expect that the column will probably small.
var CHUNK_SIZE = 1024;
var offset = 0;
var fr = new FileReader();
fr.onload = function() {
var view = new Uint8Array(fr.result);
for (var i = 0; i < view.length; ++i) {
if (view[i] === 10 || view[i] === 13) {
// \n = 10 and \r = 13
// column length = offset + position of \r or \n
callback(offset + i);
return;
}
}
// \r or \n not found, continue seeking.
offset += CHUNK_SIZE;
seek();
};
fr.onerror = function() {
// Cannot read file... Do something, e.g. assume column size = 0.
callback(0);
};
seek();
function seek() {
if (offset >= file.size) {
// No \r or \n found. The column size is equal to the full
// file size
callback(file.size);
return;
}
var slice = file.slice(offset, offset + CHUNK_SIZE);
fr.readAsArrayBuffer(slice);
}
}
The previous snippet counts the number of bytes before a line break. Counting the number of characters in a text consisting of multibyte characters is slightly more difficult, because you have to account for the possibility that the last byte in the chunk could be a part of a multibyte character.
前面的代码段计算换行前的字节数。计算由多字节字符组成的文本中的字符数稍微困难一些,因为您必须考虑块中的最后一个字节可能是多字节字符的一部分的可能性。
回答by Edy Segura
There is a awesome library called Papa Parsethat do that in a graceful way! It can really handle big files and also you can use web worker.
有一个很棒的库叫做Papa Parse,它以一种优雅的方式做到了这一点!它真的可以处理大文件,你也可以使用 web worker。
Just try out the demos that they provide: https://www.papaparse.com/demo
只需尝试他们提供的演示:https: //www.papaparse.com/demo

