SQL 使用 DISTINCT 子句过滤数据但仍提取其他非 DISTINCT 字段

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3868140/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 07:49:22  来源:igfitidea点击:

Using a DISTINCT clause to filter data but still pull other fields that are not DISTINCT

sqlruby-on-railspostgresqldistinct

提问by mindtonic

I am trying to write a query in Postgresql that pulls a set of ordered data and filters it by a distinct field. I also need to pull several other fields from the same table row, but they need to be left out of the distinct evaluation. example:

我正在尝试在 Postgresql 中编写一个查询,该查询提取一组有序数据并按不同的字段对其进行过滤。我还需要从同一表行中提取其他几个字段,但需要将它们排除在不同的评估之外。例子:

  SELECT DISTINCT(user_id) user_id, 
         created_at 
    FROM creations 
ORDER BY created_at   
   LIMIT 20

I need the user_idto be DISTINCT, but don't care whether the created_at date is unique or not. Because the created_at date is being included in the evaluation, I am getting duplicate user_idin my result set.

我需要user_idDISTINCT,但不关心 created_at 日期是否唯一。因为 created_at 日期包含在评估中,所以user_id我的结果集中出现重复。

Also, the data must be ordered by the date, so using DISTINCT ONis not an option here. It required that the DISTINCT ONfield be the first field in the ORDER BYclause and that does not deliver the results that I seek.

此外,数据必须按日期排序,因此DISTINCT ON此处不能使用。它要求该DISTINCT ON字段是ORDER BY子句中的第一个字段,并且不会提供我寻求的结果。

How do I properly use the DISTINCTclause but limit its scope to only one field while still selecting other fields?

如何正确使用该DISTINCT子句但将其范围限制为仅一个字段,同时仍选择其他字段?

采纳答案by Bill Karwin

As you've discovered, standard SQL treats DISTINCTas applying to the whole select-list, not just one column or a few columns. The reason for this is that it's ambiguous what value to put in the columns you exclude from the DISTINCT. For the same reason, standard SQL doesn't allow you to have ambiguous columns in a query with GROUP BY.

正如您所发现的,标准 SQL 被视为DISTINCT应用于整个选择列表,而不仅仅是一列或几列。这样做的原因是,将什么值放入您从DISTINCT. 出于同样的原因,标准 SQL 不允许您在带有GROUP BY.

But PostgreSQL has a nonstandard extension to SQL to allow for what you're asking: DISTINCT ON (expr).

但是 PostgreSQL 有一个非标准的 SQL 扩展来满足您的要求: DISTINCT ON (expr).

SELECT DISTINCT ON (user_id) user_id, created_at 
FROM creations 
ORDER BY user_id, created_at   
LIMIT 20

You have to include the distinct expression(s) as the leftmost part of your ORDER BY clause.

您必须包含不同的表达式作为 ORDER BY 子句的最左侧部分。

See the manual on DISTINCT Clausefor more information.

有关详细信息,请参阅有关DISTINCT 子句的手册。

回答by Matthew

If you want the most recent created_at for each user then I suggest you aggregate like this:

如果您想要每个用户的最新 created_at,那么我建议您按如下方式聚合:

SELECT user_id, MAX(created_at)
FROM creations
WHERE ....
GROUP BY user_id
ORDER BY created_at DESC

This will return the most recent created_at for each user_id If you only want the top 20, then append

这将为每个 user_id 返回最新的 created_at 如果您只想要前 20 个,则追加

LIMIT 20

EDIT: This is basically the same thing Unreason said above... define from which row you want the data by aggregation.

编辑:这基本上与 Unreason 上面所说的相同......通过聚合定义您想要数据的哪一行。

回答by davur

The GROUP BYshould ensure distinct values of the grouped columns, this might give you what you are after.

GROUP BY分组列应保证不同的值,这可能给你你所追求的。

(Note I'm putting in my 2 cents even though I am not familiar with PostgreSQL, but rather MySQL and Oracle)

(注意,即使我不熟悉 PostgreSQL,而是 MySQL 和 Oracle,我还是投入了 2 美分)

In MySql

在 MySql 中

SELECT user_id, created_at
FROM creations
GROUP BY user_id
ORDER BY user_id

In Oracle sqlplus

在 Oracle sqlplus 中

SELECT user_id, FIRST(created_at)
FROM creations
GROUP BY user_id
ORDER BY user_id

These will give you the user_idfollowed by the firstcreated_atassociated with that user_id. If you want a different created_atyou have the option to substitute FIRST with other functions like AVG, MIN, MAX, or LASTin Oracle, you can also try adding ORDER BYon other columns (including ones that are not returned, to give you a different created_at.

这些将为您提供与该相关联user_id第一个。如果你想要一个不同的,你可以选择用其他函数代替 FIRST,如, , , 或在 Oracle 中,你也可以尝试添加其他列(包括那些没有返回的列,给你一个不同的.created_atuser_idcreated_atAVGMINMAXLASTORDER BYcreated_at

回答by Unreason

Your question is not well defined - when you say you need also other data from the same row you are not defining which row.

您的问题没有明确定义 - 当您说您还需要来自同一行的其他数据时,您并没有定义哪一行。

You do say you need to order the results by created_at, so I will assumethat you want values from the row with min created_at(earliest).

您确实说过您需要按 对结果进行排序created_at,因此我假设您需要 min created_at(最早)行中的值。

This now becomes one of the most common so SQL questions - retrieving rows containing some aggregate value (MIN, MAX).

这现在成为最常见的 SQL 问题之一 - 检索包含某些聚合值(MIN、MAX)的行。

For example

例如

SELECT user_id, MIN(created_at) AS created_at
FROM creations
GROUP BY user_id
ORDER BY MIN(create_at)
LIMIT 20

This approach will not let you (easily) pick other values from the same row.

这种方法不会让您(轻松)从同一行中选择其他值。

One approach that will let you pick other values is

一种让您选择其他值的方法是

SELECT c.user_id, c.created_at, c.other_columns
FROM creations c LEFT JOIN creation c_help
     ON c.user_id = c_help.user_id AND c.created_at > c_help.create_at
WHERE c_help IS NULL
ORDER BY c.created_at
LIMIT 20

回答by mindtonic

Using a sub-query was suggested by someone on the irc #postgresql channel. It worked:

irc #postgresql 频道上有人建议使用子查询。有效:

SELECT user_id  
FROM (SELECT DISTINCT ON (user_id) * FROM creations) ss  
ORDER BY created_at DESC  
LIMIT 20;