java 如何处理来自数据库的巨大结果集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/231827/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 11:32:34  来源:igfitidea点击:

How to handle huge result sets from database

javadatabaseweb-applicationslazy-loadingresultset

提问by Steve Kuo

I'm designing a multi-tiered database driven web application – SQL relational database, Java for the middle service tier, web for the UI. The language doesn't really matter.

我正在设计一个多层数据库驱动的 Web 应用程序——SQL 关系数据库,Java 用于中间服务层,Web 用于 UI。语言真的不重要。

The middle service tier performs the actual querying of the database. The UI simply asks for certain data and has no concept that it's backed by a database.

中间服务层执行数据库的实际查询。UI 只是要求某些数据,并没有概念它由数据库支持。

The question is how to handle large data sets? The UI asks for data but the results might be huge, possibly too big to fit in memory. For example, a street sign application might have a service layer of:

问题是如何处理大数据集?UI 请求数据,但结果可能很大,可能太大而无法放入内存。例如,街道标志应用程序可能具有以下服务层:

StreetSign getStreetSign(int identifier)
Collection<StreetSign> getStreetSigns(Street street)
Collection<StreetSign> getStreetSigns(LatLonBox box)

The UI layer asks to get all street signs meeting some criteria. Depending on the criteria, the result set might be huge. The UI layer might divide the results into separate pages (for a browser) or just present them all (serving up to Goolge Earth). The potentially huge result set could be a performance and resource problem (out of memory).

UI 层要求获得满足某些条件的所有街道标志。根据标准,结果集可能很大。UI 层可能会将结果分成单独的页面(对于浏览器)或仅呈现它们(提供给 Goolge Earth)。潜在的巨大结果集可能是性能和资源问题(内存不足)。

One solution is not to return fully loaded objects (StreetSign objects). Rather return some sort of result set or iterator that lazily loads each individual object.

一种解决方案是不返回完全加载的对象(StreetSign 对象)。而是返回某种延迟加载每个单独对象的结果集或迭代器。

Another solution is to change the service API to return a subset of the requested data:

另一种解决方案是更改服务 API 以返回请求数据的子集:

Collection<StreetSign> getStreetSigns(LatLonBox box, int pageNumber, int resultsPerPage)

Of course the UI can still request a huge result set:

当然 UI 仍然可以请求一个巨大的结果集:

getStreetSigns(box, 1, 1000000000)

I'm curious what is the standard industry design pattern for this scenario?

我很好奇这个场景的标准行业设计模式是什么?

回答by RogueOne

The very first question should be:

第一个问题应该是:

?The user needs to, or is capable of, manage this amount of data?

?用户需要或有能力管理如此大量的数据?

Although the result set should be paged, if its potentially size is so huge, the answer will be "probably not", so the UI shouldn't try to show it.

虽然结果集应该被分页,但如果它的潜在大小如此巨大,答案将是“可能不会”,因此 UI 不应尝试显示它。

I worked on J2EE projects on Health Care Systems, that deal with enormous amount of stored data, literally millions of patients, visits, forms, etc, and the general rule is not to show more than 100 or 200 rows for any user search, advising the user that those set of criteria produces more information that he can understand.

我在医疗保健系统的 J2EE 项目中工作,处理大量存储的数据,数以百万计的患者、访问、表格等,一般规则是不要为任何用户搜索显示超过 100 或 200 行,建议那些标准集产生更多他可以理解的信息的用户。

The way to implement this varies from one project to another, it is possible to force the UI to ask the service tier the size of a query before launching it, or it is possible to throw an Exception from the service tier if the result set grows too much (however this way couples the service tier with the limited implementation of an UI).

实现这一点的方法因项目而异,可以强制 UI 在启动之前询问服务层查询的大小,或者如果结果集增长,则可以从服务层抛出异常太多了(但是这种方式将服务层与有限的 UI 实现结合起来)。

Be careful! This not means that every method on the service tier must throw an Exception if its result sizes more than 100, this general rule only applies to result sets that are shown to the user directly, that is a better reason to place the control in the UI instead on the service tier.

当心!这并不意味着服务层上的每个方法如果其结果大小超过 100 就必须抛出异常,这个通用规则仅适用于直接向用户显示的结果集,这是将控件放在 UI 中的更好理由而是在服务层。

回答by HTTP 410

The most frequent pattern I've seen for this situation is some sort of paging, usually done server-side to reduce the amount of information sent over the wire.

对于这种情况,我见过的最常见的模式是某种分页,通常在服务器端完成以减少通过线路发送的信息量。

Here's a SQL Server 2000 example using a table variable (generally faster than a temp table) together with your street signs example:

这是一个 SQL Server 2000 示例,它使用表变量(通常比临时表快)以及您的街道标志示例:

CREATE PROCEDURE GetPagedStreetSigns
(
  @Page int = 1,
  @PageSize int = 10
)
AS
  SET NOCOUNT ON

  -- This memory-variable table will control paging
  DECLARE @TempTable TABLE (RowNumber int identity, StreetSignId int)

  INSERT INTO @TempTable
  (
     StreetSignId
  )
  SELECT [Id]
  FROM   StreetSign
  ORDER BY [Id]

  -- select only those rows belonging to the requested page
  SELECT SS.*
  FROM   StreetSign SS
         INNER JOIN @TempTable TT ON TT.StreetSignId = SS.[Id]
  WHERE  TT.RowNumber BETWEEN ((@Page - 1) * @PageSize + 1) 
                      AND (@Page * @PageSize)

In SQL Server 2005, you can get more clever with stuff like Common Table Expressions and the new SQL Ranking functions. But the general theme is that you use the server to return only the information belonging to the current page.

在 SQL Server 2005 中,您可以使用诸如公共表表达式和新的 SQL 排名函数之类的东西变得更聪明。但是一般的主题是您使用服务器只返回属于当前页面的信息。

Be aware that this approach can get messy if you're allowing the end-user to apply on-the-fly filters to the data that s/he's seeing.

请注意,如果您允许最终用户将动态过滤器应用于他/她所看到的数据,则此方法可能会变得混乱。

回答by MusiGenesis

One thing to be wary of when working with home-grown row-wrapper classes like you (apparently) have, is code that makes additional calls to the database without you (the developer) being aware of it. For example, you might call a method that returns a collection of Person objects and think that the only thing going on under the hood is a single "SELECT * FROM PERSONS" call. In actuality, the method you're calling might iterate through the returned collection of Person objects and make additional DB calls to populate each Person's Orders collection.

在使用您(显然)拥有的自有行包装类时,需要注意的一件事是在您(开发人员)不知道的情况下对数据库进行额外调用的代码。例如,您可能会调用一个返回 Person 对象集合的方法,并认为幕后唯一发生的事情是单个“SELECT * FROM Persons”调用。实际上,您调用的方法可能会遍历返回的 Person 对象集合,并进行额外的 DB 调用来填充每个 Person 的 Orders 集合。

As you say, one of your solutions is to not return fully-loaded objects, so you're probably aware of this potential problem. One of the reasons I tend to avoid using row wrappers is that they invariably make it difficult to tune your application and minimize the size and frequency of database traffic.

正如您所说,您的解决方案之一是不返回完全加载的对象,因此您可能已经意识到这个潜在问题。我倾向于避免使用行包装器的原因之一是它们总是使调整您的应用程序和最小化数据库流量的大小和频率变得困难。

回答by Brian Schmitt

I would say if the potential exsists for a large set of data, then go the paging route.

我会说如果大量数据存在潜力,那么就走分页路线。

You can still set a MAX that you do not want them to go over.

您仍然可以设置一个您不希望它们超过的 MAX。

E.G. SO uses page sizes of 15, 30, 50...

EG SO 使用 15、30、50 的页面大小......

回答by Gunny

When I deal with this type of issue, I usually chunk the data sent to the browser (or thin/thick client, whichever is more appropriate for your situation) as regardless of the actual total size of the data that meets some certain criteria, only a small portion is really usable in any UI at one time.

当我处理此类问题时,我通常将发送到浏览器(或瘦/胖客户端,以更适合您的情况为准)的数据分块,而不管满足某些特定标准的数据的实际总大小,仅一小部分一次真的可以在任何 UI 中使用。

I live in a Microsoft world, so my primary environment is ASP.Net with SQL Server. Here are two articles about paging (which mention some techniques for paging through result sets) that may be helpful:

我生活在微软的世界里,所以我的主要环境是带有 SQL Server 的 ASP.Net。这里有两篇关于分页的文章(其中提到了一些对结果集进行分页的技术)可能会有所帮助:

Paging through lots of data efficiently (and in an Ajax way) with ASP.NET 2.0Efficient Data Paging with the ASP.NET 2.0 DataList Control and ObjectDataSource

使用 ASP.NET 2.0高效地(并以 Ajax 方式)对大量数据进行分页 使用 ASP.NET 2.0 DataList 控件和 ObjectDataSource 进行高效数据分页

Another mechanism that Microsoft has shipped lately is their idea of "Dynamic Data" - you might be able to check out the guts of this for some guidance as to how they're dealing with this issue.

Microsoft 最近发布的另一种机制是他们的“动态数据”概念——您可以查看其内容以获取有关他们如何处理此问题的一些指导。

回答by Niniki

I've done similar things on two different products. In one case the data source is optionally paginated -- for java, implements a Pageable interface similar to:

我在两种不同的产品上做过类似的事情。在一种情况下,数据源是可选分页的——对于 java,实现类似于以下内容的 Pageable 接口:

public interface Pageable
{
    public void setStartIndex( int index );
    public int getStartIndex();
    public int getRowsPerPage() throws Exception;
    public void setRowsPerPage( int rowsPerPage );
}

The data source implements another method for get() of items, and the implementation of a paginated data source just returns the current page. So you can set your start index, and grab a page in your controller.

数据源实现了另一个get()方法,分页数据源的实现只是返回当前页面。所以你可以设置你的开始索引,并在你的控制器中抓取一个页面。

One thing to consider will be to cache your cursors server side. For a web app you'll have to expire them, but they'll really help performance wise.

要考虑的一件事是缓存您的游标服务器端。对于网络应用程序,您必须使它们过期,但它们确实有助于提高性能。

回答by Matthew Smith

The fedora digital repositoryproject returns a maximum number of results with a result-set-id. You then get the rest of the result by asking for the next chunk supplying the result-set-id in the subsequent query. It works ok as long as you don't want to do any searching or sorting outside of the query.

Fedora的数字资源库项目回报与结果设置ID结果的最大数量。然后,您可以通过在后续查询中请求提供结果集 ID 的下一个块来获得其余的结果。只要您不想在查询之外进行任何搜索或排序,它就可以正常工作。

回答by Matthew Smith

From the datay retrieval layer, the standard design pattern is to have two method interfaces, one for all and one for a block size.

从数据检索层来看,标准的设计模式是有两种方法接口,一种用于所有,一种用于块大小。

If you wish, you can layer components that do paging over it.

如果您愿意,您可以将在其上进行分页的组件分层。

回答by Ty.

In ASP.NET I would use server-side paging, where you only retrieve the page of data the user has requested from the data store. This is opposed to retrieving the entire result set, putting it into memory and paging through it on request.

在 ASP.NET 中,我将使用服务器端分页,您只检索用户从数据存储请求的数据页面。这与检索整个结果集、将其放入内存并根据请求分页相反。

回答by dacracot

JSF or JavaServerFaces has widgets for chunking large result sets to the browser. It can be parameterized as you suggest. I wouldn't call it a "standard industry design pattern" by any means, but it is worth a look to see how someone else solved the problem.

JSF 或 JavaServerFaces 具有用于将大型结果集分块到浏览器的小部件。它可以按照您的建议进行参数化。我无论如何都不会称其为“标准行业设计模式”,但值得一看,看看其他人是如何解决这个问题的。