SQL 在多列上计算 DISTINCT

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1471250/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 03:44:26  来源:igfitidea点击:

Counting DISTINCT over multiple columns

sqlsql-serverperformancetsqlquery-optimization

提问by Novitzky

Is there a better way of doing a query like this:

有没有更好的方法来做这样的查询:

SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
      FROM DocumentOutputItems) AS internalQuery

I need to count the number of distinct items from this table but the distinct is over two columns.

我需要计算该表中不同项目的数量,但不同项目超过两列。

My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)

我的查询工作正常,但我想知道是否可以仅使用一个查询(不使用子查询)来获得最终结果

采纳答案by Jason Horner

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

如果您想提高性能,您可以尝试在两列的散列值或串联值上创建持久计算列。

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

一旦它被持久化,如果该列是确定性的并且您使用的是“健全的”数据库设置,则可以对其进行索引和/或可以在其上创建统计信息。

I believe a distinct count of the computed column would be equivalent to your query.

我相信计算列的不同计数相当于您的查询。

回答by JayTee

Edit: Altered from the less-than-reliable checksum-only queryI've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable

编辑:从不太可靠的仅校验和查询改变我发现了一种方法来做到这一点(在 SQL Server 2005 中)对我来说效果很好,我可以根据需要使用尽可能多的列(通过将它们添加到CHECKSUM() 函数)。REVERSE() 函数将整数转换为 varchars 以使不同的更可靠

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems

回答by APC

What is it about your existing query that you don't like? If you are concerned that DISTINCTacross two columns does not return just the unique permutations why not try it?

您不喜欢现有查询的哪些方面?如果您担心DISTINCT跨两列不返回唯一的排列,为什么不试试呢?

It certainly works as you might expect in Oracle.

它确实像您在 Oracle 中所期望的那样工作。

SQL> select distinct deptno, job from emp
  2  order by deptno, job
  3  /

    DEPTNO JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

9 rows selected.


SQL> select count(*) from (
  2  select distinct deptno, job from emp
  3  )
  4  /

  COUNT(*)
----------
         9

SQL>

edit

编辑

I went down a blind alley with analytics but the answer was depressingly obvious...

我在分析方面走上了一条死胡同,但答案却令人沮丧地显而易见......

SQL> select count(distinct concat(deptno,job)) from emp
  2  /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
                                9

SQL>

edit 2

编辑 2

Given the following data the concatenating solution provided above will miscount:

鉴于以下数据,上面提供的连接解决方​​案将错误计算:

col1  col2
----  ----
A     AA
AA    A

So we to include a separator...

所以我们要包含一个分隔符......

select col1 + '*' + col2 from t23
/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

显然,选择的分隔符必须是一个字符或一组字符,它永远不会出现在任一列中。

回答by spelunk1

To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.

要作为单个查询运行,请连接列,然后获取连接字符串实例的不同计数。

SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

In MySQL you can do the same thing without the concatenation step as follows:

在 MySQL 中,您可以在没有连接步骤的情况下执行相同的操作,如下所示:

SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;

This feature is mentioned in the MySQL documentation:

MySQL 文档中提到了此功能:

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

回答by Trevor Tippins

How about something like:

怎么样:

select count(*)
from
  (select count(*) cnt
   from DocumentOutputItems
   group by DocumentId, DocumentSessionId) t1

Probably just does the same as you are already though but it avoids the DISTINCT.

可能只是和你已经做的一样,但它避免了 DISTINCT。

回答by Alexander Kj?ll

Here's a shorter version without the subselect:

这是一个没有子选择的较短版本:

SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems

It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.

它在 MySQL 中运行良好,我认为优化器更容易理解这一点。

Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.

编辑:显然我误读了 MSSQL 和 MySQL - 对此很抱歉,但也许无论如何它都有帮助。

回答by karmakaze

Many (most?) SQL databases can work with tuples like values so you can just do: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems; If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g. COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId)).

许多(大多数?)SQL 数据库可以使用像值这样的元组,因此您可以这样做: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems; 如果您的数据库不支持此功能,则可以根据 @oncel-umut-turer 的 CHECKSUM 或其他标量函数的建议进行模拟,以提供良好的唯一性例如 COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))

A related use of tuples is performing INqueries such as: SELECT * FROM DocumentOutputItems WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));

元组的一个相关用途是执行IN查询,例如: SELECT * FROM DocumentOutputItems WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));

回答by Bliek

There's nothing wrong with your query, but you could also do it this way:

您的查询没有任何问题,但您也可以这样做:

WITH internalQuery (Amount)
AS
(
    SELECT (0)
      FROM DocumentOutputItems
  GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
  FROM internalQuery

回答by IordanTanev

Hope this works i am writing on prima vista

希望这行得通,我正在 prima vista 上写作

SELECT COUNT(*) 
FROM DocumentOutputItems 
GROUP BY DocumentId, DocumentSessionId

回答by Jaanis Veinberg

I have used this approach and it has worked for me.

我已经使用了这种方法并且它对我有用。

SELECT COUNT(DISTINCT DocumentID || DocumentSessionId) 
FROM  DocumentOutputItems

For my case, it provides correct result.

就我而言,它提供了正确的结果。