SQL 如何从包含百万条记录的数据库中选择前“N”条记录?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1410048/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 03:33:45  来源:igfitidea点击:

How to select first 'N' records from a database containing million records?

sqloraclesortingsql-order-by

提问by aJ.

I have an oracle database populated with million records. I am trying to write a SQL query that returns the first 'N" sorted records ( say 100 records) from the database based on certain condition.

我有一个填充了数百万条记录的 oracle 数据库。我正在尝试编写一个 SQL 查询,该查询根据特定条件从数据库中返回前“N”条排序记录(比如 100 条记录)。

SELECT * 
FROM myTable 
Where SIZE > 2000 
ORDER BY NAME DESC

Then programmatically select first N records.

然后以编程方式选择前 N 条记录。

The problem with this approach is :

这种方法的问题是:

  • The query results into half million records and "ORDER BY NAME" causes all the records to be sorted on NAME in the descending order. This sorting is taking lot of time. (nearly 30-40 seconds. If I omit ORDER BY, it takes only 1 second).
  • After the sort I am interested in only first N (100) records. So the sorting of complete records is not useful.
  • 查询结果为 50 万条记录,“ORDER BY NAME”使所有记录按 NAME 降序排序。这种排序需要很多时间。(将近 30-40 秒。如果我省略 ORDER BY,则只需 1 秒)。
  • 排序后,我只对前 N (100) 条记录感兴趣。所以对完整记录的排序是没有用的。

My questions are:

我的问题是:

  1. Is it possible to specify the 'N' in query itself? ( so that sort applies to only N records and query becomes faster).
  2. Any better way in SQL to improve the query to sort only N elements and return in quick time.
  1. 是否可以在查询本身中指定“N”?(这样排序只适用于 N 条记录并且查询变得更快)。
  2. SQL 中任何更好的方法来改进查询以仅对 N 元素进行排序并快速返回。

回答by Vincent Malgrat

If your purpose is to find 100 random rows and sort them afterwards then Lasse's solutionis correct. If as I think you want the first 100 rows sorted by name while discarding the others you would build a query like this:

如果您的目的是找到 100 个随机行并在之后对它们进行排序,那么Lasse 的解决方案是正确的。如果我认为您希望按名称排序的前 100 行同时丢弃其他行,您将构建如下查询:

SELECT * 
  FROM (SELECT * 
          FROM myTable 
         WHERE SIZE > 2000 ORDER BY NAME DESC) 
 WHERE ROWNUM <= 100

The optimizer will understand that it is a TOP-N query and will be able to use an index on NAME. It won't have to sort the entire result set, it will just start at the end of the index and read it backwards and stop after 100 rows.

优化器将理解这是一个 TOP-N 查询,并且将能够使用 NAME 上的索引。它不必对整个结果集进行排序,它只会从索引的末尾开始并向后读取它并在 100 行后停止。

You could also add an hint to your original query to let the optimizer understand that you are interested in the first rows only. This will probably generate a similar access path:

您还可以向原始查询添加提示,让优化器了解您只对第一行感兴趣。这可能会生成类似的访问路径:

SELECT /*+ FIRST_ROWS*/* FROM myTable WHERE SIZE > 2000 ORDER BY NAME DESC

Edit:just adding AND rownum <= 100to the query won't work since in Oracle rownum is attributed beforesorting : this is why you have to use a subquery. Without the subquery Oracle will select 100 random rows then sort them.

编辑:只是添加AND rownum <= 100到查询是行不通的,因为在 Oracle 中 rownum 是排序之前归因的:这就是您必须使用子查询的原因。如果没有子查询,Oracle 将随机选择 100 行然后对它们进行排序。

回答by IanH

Thisshows how to pick the top N rows depending on your version of Oracle.

显示了如何根据您的 Oracle 版本选择前 N 行。

From Oracle 9i onwards, the RANK() and DENSE_RANK() functions can be used to determine the TOP N rows. Examples:

Get the top 10 employees based on their salary

SELECT ename, sal FROM ( SELECT ename, sal, RANK() OVER (ORDER BY sal DESC) sal_rank FROM emp ) WHERE sal_rank <= 10;

Select the employees making the top 10 salaries

SELECT ename, sal FROM ( SELECT ename, sal, DENSE_RANK() OVER (ORDER BY sal DESC) sal_dense_rank FROM emp ) WHERE sal_dense_rank <= 10;

从 Oracle 9i 开始,RANK() 和 DENSE_RANK() 函数可用于确定前 N 行。例子:

根据薪水获取前 10 名员工

SELECT ename, sal FROM ( SELECT ename, sal, RANK() OVER (ORDER BY sal DESC) sal_rank FROM emp ) WHERE sal_rank <= 10;

选择工资最高的 10 名员工

SELECT ename, sal FROM ( SELECT ename, sal, DENSE_RANK() OVER (ORDER BY sal DESC) sal_dense_rank FROM emp ) WHERE sal_dense_rank <= 10;

The difference between the two is explained here

两者之间的区别解释here

回答by Lasse V. Karlsen

Add this:

添加这个:

 AND rownum <= 100

to your WHERE-clause.

到您的 WHERE 子句。

However, this won't do what you're asking.

但是,这不会满足您的要求。

If you want to pick 100 random rows, sort those, and then return them, you'll have to formulate a query without the ORDER BY first, then limit that to 100 rows, then select from that and sort.

如果你想随机选取 100 行,对它们进行排序,然后返回它们,你必须首先制定一个没有 ORDER BY 的查询,然后将其限制为 100 行,然后从中选择并排序。

This couldwork, but unfortunately I don't have an Oracle server available to test:

可以工作,但不幸的是我没有可用于测试的 Oracle 服务器:

SELECT *
FROM (
    SELECT *
    FROM myTable
    WHERE SIZE > 2000
      AND rownum <= 100
    ) x
ORDER BY NAME DESC

But note the "random" part there, you're saying "give me 100 rows with SIZE > 2000, I don't care which 100".

但是请注意那里的“随机”部分,您是说“给我 100 行 SIZE > 2000,我不在乎是哪 100 行”。

Is that really what you want?

这真的是你想要的吗?

And no, you won't actually get a random result, in the sense that it'll change each time you query the server, but you are at the mercy of the query optimizer. If the data load and index statistics for that table changes over time, at some point you might get different data than you did on the previous query.

不,您实际上不会得到随机结果,因为每次查询服务器时它都会改变,但您受查询优化器的支配。如果该表的数据加载和索引统计信息随时间发生变化,那么在某些时候您可能会获得与之前查询不同的数据。

回答by Jeffrey Kemp

Your problem is that the sort is being done every time the query is run. You can eliminate the sort operation by using an index - the optimiser can use an index to eliminate a sort operation - if the sorted column is declared NOT NULL.

您的问题是每次运行查询时都会进行排序。您可以通过使用索引来消除排序操作——优化器可以使用索引来消除排序操作——如果已排序的列被声明为 NOT NULL。

(If the column is nullable, it is still possible, by either (a) adding a NOT NULL predicate to the query, or (b) adding a function-based index and modifying the ORDER BY clause accordingly).

(如果该列可以为空,则仍然可以通过 (a) 向查询添加 NOT NULL 谓词,或 (b) 添加基于函数的索引并相应地修改 ORDER BY 子句)。

回答by Muntasir

Just for reference, in Oracle 12c, this task can be done using FETCHclause. You can see herefor examples and additional reference links regarding this matter.

仅供参考,在 Oracle 12c 中,可以使用FETCH子句完成此任务。您可以在此处查看有关此问题的示例和其他参考链接。