SQL 计算不同记录的窗口函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13480880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 12:13:15  来源:igfitidea点击:

Window functions to count distinct records

sqlsql-server-2008tsql

提问by wootscootinboogie

The query below is based on a complicated view and the view works as I want it to (I'm not going to include the view because I don't think it will help with the question at hand). What I can't get right is the drugCountsinFamiliescolumn. I need it to show me the number of distinct drugNames for each drug family. You can see from the first screencap that there are three different H3A rows. The drugCountsInFamiliesfor H3A should be 3 (there are three different H3A drugs. )

下面的查询基于一个复杂的视图,并且该视图按我的意愿工作(我不打算包含该视图,因为我认为它对解决手头的问题没有帮助)。我无法理解的是drugCountsinFamilies专栏。我需要它来显示distinct drugName每个药物系列的s数量。您可以从第一个屏幕截图中看到有三个不同的 H3A 行。在drugCountsInFamilies为H3A应该是3(有三种不同的H3A药物。)

enter image description here

在此处输入图片说明

You can see from the second screen cap that what's happening is the drugCountsInFamiliesin the first screen cap is catching the number of rows that the drug name is listed on.
enter image description here

您可以从第二个屏幕截图中看到drugCountsInFamilies,第一个屏幕截图中正在捕获列出药物名称的行数。
在此处输入图片说明

Below is my question, with comments on the part that is incorrect

以下是我的问题,对不正确的部分进行评论

select distinct
     rx.patid
    ,d2.fillDate
    ,d2.scriptEndDate
    ,rx.drugName
    ,rx.drugClass
    --the line directly below is the one that I can't figure out why it's wrong
    ,COUNT(rx.drugClass) over(partition by rx.patid,rx.drugclass,rx.drugname) as drugCountsInFamilies
from 
(
select 
    ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn
    ,d.patid
    ,d.fillDate
    ,d.scriptEndDate
    ,d.uniqueDrugsInTimeFrame
    from DrugsPerTimeFrame as d
)d2
inner join rx on rx.patid = d2.patid
inner join DrugTable as dt on dt.drugClass=rx.drugClass
where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate
and dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
order by rx.patid

SSMS gets mad if I try to add a distinct to the count(rx.drugClass)clause. Can it be done using window functions?

如果我尝试在count(rx.drugClass)子句中添加一个不同的,SSMS 会生气。可以使用窗口函数来完成吗?

回答by JoeFletch

I came across this question in search for a solution to my problem of counting distinct values. In searching for an answer I came across this post. See last comment. I've tested it and used the SQL. It works really well for me and I figured that I would provide another solution here.

我遇到这个问题是为了寻找我计算不同值的问题的解决方案。在寻找答案时,我遇到了这篇文章。见最后一条评论。我已经对其进行了测试并使用了 SQL。它对我来说非常有效,我想我会在这里提供另一种解决方案。

In summary, using DENSE_RANK(), with PARTITION BYthe grouped columns, and ORDER BYboth ASCand DESCon the columns to count:

总之,使用DENSE_RANK(),与PARTITION BY分组列,ORDER BYASCDESC在列数:

DENSE_RANK() OVER (PARTITION BY drugClass ORDER BY drugName ASC) +
DENSE_RANK() OVER (PARTITION BY drugClass ORDER BY drugName DESC) - 1 AS drugCountsInFamilies

I use this as a template for myself.

我用它作为自己的模板。

DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields ASC ) +
DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields DESC) - 1 AS DistinctCount

I hope this helps!

我希望这有帮助!

回答by Gordon Linoff

Doing a count(distinct)as a windows function requires a trick. Several levels of tricks, actually.

将 acount(distinct)作为 windows 函数需要一个技巧。实际上有几个级别的技巧。

Because your request is actually truly simple -- the value is always 1 because rx.drugClass is in the partitioning clause -- I will make an assumption. Let's say you want to count the number of unique drug classes per patid.

因为您的请求实际上非常简单——该值始终为 1,因为 rx.drugClass 在分区子句中——我将做一个假设。假设您想计算每个患者的独特药物类别的数量。

If so, do a row_number()partitioned by patid and drugClass. When this is 1, within a patid, , then a new drugClass is starting. Create a flag that is 1 in this case and 0 in all other cases.

如果是这样,请row_number()按 patid 和 drugClass进行分区。当此值为 1 时,在 patid 中,则开始一个新的药物类。创建一个标志,在这种情况下为 1,在所有其他情况下为 0。

Then, you can simply do a sumwith a partitioning clause to get the number of distinct values.

然后,您可以简单地sum使用分区子句来获取不同值的数量。

The query (after formatting it so I can read it), looks like:

查询(格式化后,以便我可以阅读),看起来像:

select rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
       SUM(IsFirstRowInGroup) over (partition by rx.patid) as NumDrugCount
from (select distinct rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
             (case when 1 = ROW_NUMBER() over (partition by rx.drugClass, rx.patid order by (select NULL))
                   then 1 else 0
              end) as IsFirstRowInGroup
      from (select ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn, 
                   d.patid, d.fillDate, d.scriptEndDate, d.uniqueDrugsInTimeFrame
            from DrugsPerTimeFrame as d
           ) d2 inner join
           rx
           on rx.patid = d2.patid inner join
           DrugTable dt
           on dt.drugClass = rx.drugClass
      where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate and
            dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
     ) t
order by patid

回答by Dr. BCH

Why would something like this not work?

为什么这样的事情不起作用?

SELECT 
   IDCol_1
  ,IDCol_2
  ,Count(*) Over(Partition By IDCol_1, IDCol_2 order by IDCol_1) as numDistinct
FROM Table_1