SQL Distinct 关键字会降低性能?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5341276/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 09:44:37  来源:igfitidea点击:

SQL Distinct keyword bogs down performance?

sqlplsql

提问by MxLDevs

I have received a SQL query that makes use of the distinct keyword. When I tried running the query it took at least a minute to join two tables with hundreds of thousands of records and actually return something.

我收到了一个使用不同关键字的 SQL 查询。当我尝试运行查询时,至少需要一分钟才能将包含数十万条记录的两个表连接起来并实际返回一些内容。

I then took out the distinct and it came back in 0.2 seconds. Does the distinct keyword really make things that bad?

然后我拿出了不同的东西,它在 0.2 秒后又回来了。distinct 关键字真的会让事情变得那么糟糕吗?

EDIT: here's the query

编辑:这是查询

SELECT Distinct
c.username, o.orderno, o.totalcredits, o.totalrefunds,
o.recstatus, o.reason 

from management.contacts c 
join management.orders o
on (c.custID = o.custID)
where o.recDate > to_date('2010-01-01', 'YYYY/MM/DD')

采纳答案by Benoit

Yes, as using DISTINCTwill (sometimes according to a comment) cause results to be ordered. Sorting hundreds of records takes time.

是的,因为使用DISTINCTwill(有时根据评论)会导致结果被排序。对数百条记录进行排序需要时间。

Try GROUP BYall your columns, it can sometimes lead the query optimiser to choose a more efficient algorithm (at least with Oracle I noticed significant performance gain).

尝试GROUP BY所有列,它有时会导致查询优化器选择更有效的算法(至少在 Oracle 中我注意到显着的性能提升)。

回答by TrojanName

Distinct always sets off alarm bells to me - it usually signifies a bad table design or a developer who's unsure of themselves. It is used to remove duplicate rows, but if the joins are correct, it should rarely be needed. And yes there is a large cost to using it.

Distinct 总是给我敲响警钟——它通常意味着糟糕的表设计或对自己不确定的开发人员。它用于删除重复行,但如果连接正确,则很少需要它。是的,使用它的成本很高。

What's the primary key of the orders table? Assuming it's orderno then that should be sufficient to guarantee no duplicates. If it's something else, then you may need to do a bit more with the query, but you should make it a goal to remove those distincts! ;-)

订单表的主键是什么?假设它是 orderno 那么应该足以保证没有重复。如果是别的东西,那么您可能需要对查询做更多的事情,但是您应该将删除这些差异作为目标!;-)

Also you mentioned the query was taking a while to run when you were checking the number of rows - it can often be quicker to wrap the entire query in "select count(*) from ( )" especially if you're getting large quantities of rows returned. Just while you're testing obviously. ;-)

您还提到,当您检查行数时,查询需要一段时间才能运行 - 将整个查询包装在“select count(*) from ()”中通常会更快,特别是如果您获得大量行返回。就在你显然在测试的时候。;-)

Finally, make sure you have indexed the custID on the orders table (and maybe recDate too).

最后,确保您已在订单表(也可能是 recDate)上对 custID 进行索引。

回答by Unreason

Purpose of DISTINCT is to prune duplicate records from the result set for all the selected columns.

DISTINCT 的目的是从所有选定列的结果集中修剪重复记录。

  • If any of the selected columns is unique after join you can drop DISTINCT.
  • If you don't know that, but you know that the combination of the values of selected column is unique, you can drop DISTINCT.
  • 如果加入后任何选定的列是唯一的,您可以删除 DISTINCT。
  • 如果您不知道这一点,但您知道所选列的值的组合是唯一的,则可以删除 DISTINCT。

Actually, normally, with properly designed databases you rarely need DISTINCT and in those cases that you do it is (?) obvious that you need it. RDBMS however can not leave it to chance and must actually build an indexing structure to establish it.

实际上,通常情况下,对于设计合理的数据库,您很少需要 DISTINCT,而在这些情况下,您显然 (?) 需要它。然而,RDBMS 不能让它碰巧,必须实际构建一个索引结构来建立它。

Normally you find DISTINCT all over the place when people are not sure about JOINs and relationships between tables.

通常,当人们不确定表之间的 JOIN 和关系时,您会发现 DISTINCT 无处不在。

Also, in classes when talking about pure relational databases where the result should be a proper set (with no repeating elements = records) you can find it quite common for people to stick DISTINCT in to guarantee this property for purposes of theoretical correctness. Sometimes this creeps in into production systems.

此外,在讨论结果应该是正确集合(没有重复元素 = 记录)的纯关系数据库的课程中,您会发现人们为了理论上的正确性而坚持 DISTINCT 以保证此属性是很常见的。有时这会渗透到生产系统中。

回答by Alex Peta

You can try to make a group by like this:

您可以尝试通过以下方式创建组:

  SELECT c.username, 
         o.orderno, 
         o.totalcredits, 
         o.totalrefunds,
         o.recstatus, 
         o.reason
    FROM management.contacts c,
         management.orders o
   WHERE c.custID = o.custID
     AND o.recDate > to_date('2010-01-01', 'YYYY-MM-DD')
GROUP BY c.username, 
         o.orderno, 
         o.totalcredits, 
         o.totalrefunds,
         o.recstatus, 
         o.reason 

Also verify if you have index on o.recDate

还要验证您是否在o.recDate上有索引