javascript 将大量数据加载到内存中 - 最有效的方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4158102/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 10:32:45  来源:igfitidea点击:

Loading large amount of data into memory - most efficient way to do this?

javascriptjquerydata-structures

提问by user210099

I have a web-based documentation searching/viewing system that I'm developing for a client. Part of this system is a search system that allows the client to search for a term[s] contained in the documentation. I've got the necessary search data files created, but there's a lot of data that needs to be loaded, and it takes anywhere from 8-20 seconds to load all the data. The data is broken into 40-100 files, depending on what documentation needs to be searched. Each file is anywhere from 40-350kb.

我有一个基于 Web 的文档搜索/查看系统,正在为客户开发。该系统的一部分是一个搜索系统,允许客户搜索文档中包含的术语。我已经创建了必要的搜索数据文件,但是有很多数据需要加载,加载所有数据需要 8 到 20 秒的时间。数据分为 40-100 个文件,具体取决于需要搜索的文档。每个文件的大小为 40-350kb。

Also, this application must be able to run on the local file system, as well as through a webserver.

此外,此应用程序必须能够在本地文件系统上以及通过网络服务器运行。

When the webpage loads up, I can generate a list of what search data files I need load. This entire list must be loaded before the webpage can be considered functional.

当网页加载时,我可以生成我需要加载的搜索数据文件的列表。必须先加载整个列表,然后才能将网页视为功能正常。

With that preface out of the way, let's look at how I'm doing it now.

撇开序言,让我们看看我现在是如何做的。

After I know that the entire webpage is loaded, I call a loadData() function

在我知道整个网页被加载后,我调用了一个 loadData() 函数

function loadData(){
            var d = new Date();
            var curr_min = d.getMinutes();
            var curr_sec = d.getSeconds();
         var curr_mil = d.getMilliseconds();
         console.log("test.js started background loading, time is: " + curr_min + ":" + curr_sec+ ":" + curr_mil);
          recursiveCall();
      }


   function recursiveCall(){
      if(file_array.length > 0){
         var string = file_array.pop();
         setTimeout(function(){$.getScript(string,recursiveCall);},1);
    }
    else{
        var d = new Date();
        var curr_min = d.getMinutes();
        var curr_sec = d.getSeconds();
        var curr_mil = d.getMilliseconds();
        console.log("test.js stopped background loading, time is: " + curr_min + ":" + curr_sec+ ":" + curr_mil);
    }
  }

What this does is processes an array of files sequentially, taking a 1ms break between files. This helps prevent the browser from being completely locked up during the loading process, but the browser still tends to get bogged down by loading the data. Each of the files that I'm loading look like this:

它的作用是按顺序处理一组文件,在文件之间间隔 1 毫秒。这有助于防止浏览器在加载过程中被完全锁定,但浏览器仍然倾向于因加载数据而陷入困境。我正在加载的每个文件都如下所示:

AddToBookData(0,[0,1,2,3,4,5,6,7,8]);
AddToBookData(1,[0,1,2,3,4,5,6,7,8]);
AddToBookData(2,[0,1,2,3,4,5,6,7,8]);

Where each line is a function call that is adding data to an array. The "AddToBookData" function simply does the following:

其中每一行都是一个向数组添加数据的函数调用。“AddToBookData”函数只执行以下操作:

    function AddToBookData(index1,value1){
         BookData[BookIndex].push([index1,value1]);
    }

This is the existing system. After loading all the data, "AddToBookData" can get called 100,000+ times.

这是现有的系统。加载所有数据后,“AddToBookData”可以被调用 100,000+ 次。

I figured that was pretty inefficient, so I wrote a script to take the test.js file which contains all the function calls above, and processed it to change it into a giant array which is equal to the data structure that BookData is creating. Instead of making all the function calls that the old system did, I simply do the following:

我认为这非常低效,所以我编写了一个脚本来获取包含上述所有函数调用的 test.js 文件,并对其进行处理以将其更改为一个巨大的数组,该数组等于 BookData 正在创建的数据结构。我没有进行旧系统所做的所有函数调用,而是简单地执行以下操作:

var test_array[..........(data structure I need).......]
BookData[BookIndex] = test_array;

I was expecting to see a performance increase because I was removing all the function calls above, this method takes slightly more time to create the exact data structure. I should note that "test_array" holds slightly over 90,000 elements in my real world test.

我期待看到性能提升,因为我删除了上面的所有函数调用,这个方法需要稍微多一点时间来创建确切的数据结构。我应该注意到,“test_array”在我的真实世界测试中包含略超过 90,000 个元素。

It seems that both methods of loading data have roughly the same CPU utilization. I was surprised to find this, since I was expecting the second method to require little CPU time, since the data structure is being created before hand.

两种加载数据的方法似乎具有大致相同的 CPU 利用率。我很惊讶地发现这一点,因为我期望第二种方法需要很少的 CPU 时间,因为数据结构是事先创建的。

Please advise?

请指教?

采纳答案by Day

Looks like there are two basic areas for optimising the data loading, that can be considered and tackled separately:

看起来优化数据加载有两个基本领域,可以分别考虑和解决:

  1. Downloading the data from the server. Rather than one large file you should gain wins from parallel loads of multiple smaller files. Experiment with number of simultaneous loads, bear in mind browser limits and diminishing returns of having too many parallel connections. See my parallelvs sequentialexperiments on jsfiddle but bear in mind that the results will vary due to the vagaries of pulling the test data from github - you're best off testing with your own data under more tightly controlled conditions.
  2. Building your data structure as efficiently as possible. Your result looks like a multi-dimensional array, this interesting articleon JavaScript array performance may give you some ideas for experimentation in this area.
  1. 从服务器下载数据。您应该从多个小文件的并行加载中获益,而不是一个大文件。尝试同时加载的数量,记住浏览器限制和并行连接过多的收益递减。请参阅我在 jsfiddle 上的并行顺序实验,但请记住,由于从 github 中提取测试数据的变幻莫测,结果会有所不同 - 您最好在更严格控制的条件下使用自己的数据进行测试。
  2. 尽可能高效地构建数据结构。您的结果看起来像一个多维数组,这篇关于 JavaScript 数组性能的有趣文章可能会给您一些在该领域进行实验的想法。

But I'm not sure how far you'll really be able to go with optimising the data loading alone. To solve the actual problem with your application (browser locking up for too long) have you considered options such as?

但我不确定单独优化数据加载能走多远。为了解决您的应用程序的实际问题(浏览器锁定时间过长),您是否考虑过以下选项?

Using Web Workers

使用网络工作者

Web Workersmight not be supported by all your target browsers, but should prevent the main browser thread from locking up while it processes the data.

您的所有目标浏览器都可能不支持Web Workers,但应防止主浏览器线程在处理数据时锁定。

For browsers without workers, you could consider increasing the setTimeoutinterval slightly to give the browser time to service the user as well as your JS. This will make things actually slightly slower but may increase user happiness when combined with the next point.

对于没有工作人员的浏览器,您可以考虑setTimeout稍微增加间隔,让浏览器有时间为用户和您的 JS 提供服务。这实际上会使事情变慢,但结合下一点可能会增加用户的幸福感。

Providing feedback of progress

提供进度反馈

For both worker-capable and worker-deficient browsers, take some time to update the DOM with a progress bar. You know how many files you have left to load so progress should be fairly consistent and although things may actually be slightly slower, users will feel betterif they get the feedback and don't think the browser has locked up on them.

对于支持 worker 和缺少 worker 的浏览器,花一些时间用进度条更新 DOM。您知道还有多少文件要加载,因此进度应该相当一致,尽管实际上可能会稍微慢一些,但如果用户得到反馈并且不认为浏览器已锁定它们,他们会感觉更好

Lazy Loading

延迟加载

As suggested by jirain his comment. If Google Instant can search the entire web as we type, is it really not possible to have the server return a file with all locations of the search keyword within the current book? This file should be much smaller and faster to load than the locations of all words within the book, which is what I assume you are currently trying to get loaded as quickly as you can?

正如jira在他的评论中所建议的那样。如果 Google Instant 可以在我们输入时搜索整个网络,那么是否真的不可能让服务器返回包含当前图书中搜索关键字的所有位置的文件?这个文件应该比书中所有单词的位置更小,加载速度更快,我假设你目前正试图尽可能快地加载?

回答by user210099

I tested three methods of loading the same 9,000,000 point dataset into Firefox 3.64.

我测试了三种将相同的 9,000,000 点数据集加载到 Firefox 3.64 的方法。

1: Stephen's GetJSON Method
2) My function based push method
3) My pre-processed array appending method:

I ran my tests two ways: The first iteration of testing I imported 100 files containing 10,000 rows of data, each row containing 9 data elements [0,1,2,3,4,5,6,7,8]

我以两种方式运行我的测试:第一次测试我导入了包含 10,000 行数据的 100 个文件,每行包含 9 个数据元素 [0,1,2,3,4,5,6,7,8]

The second interation I tried combining files, so that I was importing 1 file with 9 million data points.

在第二次交互中,我尝试合并文件,因此我导入了 1 个包含 900 万个数据点的文件。

This was a lot larger than the dataset I'll be using, but it helps demonstrate the speed of the various import methods.

这比我将使用的数据集大得多,但它有助于展示各种导入方法的速度。

Separate files:                 Combined file:

JSON:        34 seconds         34
FUNC-BASED:  17.5               24
ARRAY-BASED: 23                 46

Interesting results, to say the least. I closed out the browser after loading each webpage, and ran the tests 4 times each to minimize the effect of network traffic/variation. (ran across a network, using a file server). The number you see is the average, although the individual runs differed by only a second or two at most.

至少可以说,有趣的结果。我在加载每个网页后关闭了浏览器,并每次运行测试 4 次,以尽量减少网络流量/变化的影响。(通过网络运行,使用文件服务器)。您看到的数字是平均值,尽管单个运行最多只有一两秒的差异。

回答by Sam Barnum

Fetch all the data as a string, and use split(). This is the fastest way to build an array in Javascript.

以字符串形式获取所有数据,然后使用split(). 这是在 Javascript 中构建数组的最快方法。

There's an excellent article a very similar problem, from the people who built the flickr search: http://code.flickr.com/blog/2009/03/18/building-fast-client-side-searches/

有一篇很棒的文章,一个非常相似的问题,来自构建 flickr 搜索的人:http: //code.flickr.com/blog/2009/03/18/building-fast-client-side-searches/

回答by Stephen

Instead of using $.getScriptto load JavaScript files containing function calls, consider using $.getJSON. This may boost performance. The files would now look like this:

与其使用$.getScript加载包含函数调用的 JavaScript 文件,不如考虑使用$.getJSON. 这可能会提高性能。这些文件现在看起来像这样:

{
    "key" : 0,
    "values" : [0,1,2,3,4,5,6,7,8]
}

After receiving the JSON response, you could then call AddToBookDataon it, like this:

收到 JSON 响应后,您可以调用AddToBookData它,如下所示:

function AddToBookData(json) {
     BookData[BookIndex].push([json.key,json.values]);
}

If your files have multiple sets of calls to AddToBookData, you could structure them like this:

如果您的文件有多组对 AddToBookData 的调用,您可以像这样构造它们:

[
    {
        "key" : 0,
        "values" : [0,1,2,3,4,5,6,7,8]
    },
    {
        "key" : 1,
        "values" : [0,1,2,3,4,5,6,7,8]
    },
    {
        "key" : 2,
        "values" : [0,1,2,3,4,5,6,7,8]
    }
]

And then change the AddToBookDatafunction to compensate for the new structure:

然后更改AddToBookData函数以补偿新结构:

function AddToBookData(json) {
    $.each(json, function(index, data) {
        BookData[BookIndex].push([data.key,data.values]);
    });
}  

Addendum
I suspect that regardless what method you use to transport the data from the files to the BookDataarray, the true bottleneck is in the sheer number of requests. Must the files be fragmented into 40-100? If you change to JSON format, you could load a single file that looks like this:

附录
我怀疑无论您使用什么方法将数据从文件传输到BookData数组,真正的瓶颈在于请求的绝对数量。文件必须分成 40-100 个吗?如果更改为 JSON 格式,则可以加载如下所示的单个文件:

{
    "file1" : [
        {
            "key" : 0,
            "values" : [0,1,2,3,4,5,6,7,8]
        },
        // all the rest...
    ],
    "file2" : [
        {
            "key" : 1,
            "values" : [0,1,2,3,4,5,6,7,8]
        },
        // yadda yadda
    ]
}

Then you could do one request, load all the data you need, and move on... Although the browser may initially lock up (although, maybe not), it would probably be MUCHfaster this way.

然后,你可以做一个请求,加载所有你需要的数据,然后继续前进......虽然浏览器可能最初锁定(虽然也许不是),它很可能是MUCH更快这种方式。

Here is a nice JSON tutorial, if you're not familiar: http://www.webmonkey.com/2010/02/get_started_with_json/

如果您不熟悉,这里有一个很好的 JSON 教程:http: //www.webmonkey.com/2010/02/get_started_with_json/