MySQL GROUP BY 不会删除重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7152333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
GROUP BY does not remove duplicates
提问by MacMac
I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.
我有一个我已经编码的监视列表系统,在用户监视列表的概述中,他们会看到一个记录列表,但是当在数据库中它只显示准确、正确的数字时,该列表显示重复项。
I've tried GROUP BY watch.watch_id
, GROUP BY rec.record_id
, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.
我试过GROUP BY watch.watch_id
,我试过GROUP BY rec.record_id
的任何类型的组似乎都没有删除重复项。我不确定我做错了什么。
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE watch.user_id = 1
GROUP BY watch.watch_id
LIMIT 0, 25
The watchlist table looks like this:
监视列表表如下所示:
+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
| 13 | 1 | 22 | 1314038274 |
| 14 | 1 | 25 | 1314038995 |
+----------+---------+-----------+------------+
回答by Malvolio
GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.
GROUP BY 不会“删除重复项”。GROUP BY 允许聚合。如果您只想合并重复的行,请使用 SELECT DISTINCT。
If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:
如果您需要合并某些列中重复的行,请使用 GROUP BY,但您需要指定如何处理其他列。您可以省略它们(通过不在 SELECT 子句中列出它们)或聚合它们(使用 SUM、MIN 和 AVG 等函数)。例如:
SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id
EDIT
编辑
The OP asked for some clarification.
OP 要求进行一些澄清。
Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.
考虑“视图”——由 FROM 和 JOIN 以及 WHERE 组合在一起的所有数据——将其称为 V。您可能想要做两件事。
First, you might have completely duplicate rowsthat you wish to combine:
首先,您可能希望合并完全重复的行:
a b c
- - -
1 2 3
1 2 3
3 4 5
Then simply use DISTINCT
然后只需使用 DISTINCT
SELECT DISTINCT * FROM V;
a b c
- - -
1 2 3
3 4 5
Or, you might have partially duplicate rowsthat you wish to combine:
或者,您可能希望合并部分重复的行:
a b c
- - -
1 2 3
1 2 6
3 4 5
Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would notbe combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:
前两行在某种意义上是“相同的”,但在另一种意义上明显不同(特别是,它们不会被 SELECT DISTINCT 组合起来)。您必须决定如何组合它们。您可以将 c 列视为不重要而丢弃:
SELECT DISTINCT a,b FROM V;
a b
- -
1 2
3 4
Or you could perform some kind of aggregation on them. You could add them up:
或者您可以对它们执行某种聚合。你可以把它们加起来:
SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;
a b tot
- - ---
1 2 9
3 4 5
You could add pick the smallest value:
您可以添加选择最小值:
SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;
a b first
- - -----
1 2 3
3 4 5
Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.
或者,您可以采用平均值 (AVG)、标准差 (STD) 和其他任何一组函数,这些函数采用一组 c 值并将它们组合成一个。
What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."
什么也不是真正的选择就是什么都不做。如果您只列出未分组的列,DBMS 将抛出错误(Oracle 会这样做——正确的选择,imo)或随机选择或多或少的一个值(MySQL)。但正如皮尔特博士所说,“当你选择不做决定时,你仍然做出了选择。”
回答by Jason McCreary
While SELECT DISTINCT
may indeed work in your case, it's important to note why what you have is not working.
虽然SELECT DISTINCT
可能确实适用于您的情况,但重要的是要注意为什么您所拥有的不起作用。
You're selecting fields that are outside of the GROUP BY
. Although MySQL allowsthis, the exact rows it returns for the non-GROUP BY
fields is undefined.
您正在选择GROUP BY
. 尽管 MySQL允许这样做,但它为非GROUP BY
字段返回的确切行是不确定的。
If you wanted to do this with a GROUP BY
try something more like the following:
如果您想通过GROUP BY
尝试更像以下内容来做到这一点:
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1
GROUP BY watch.watch_id)
LIMIT 0, 25
回答by Dave Warner
I Would never recommend using SELECT DISTINCT
, it's really slow on big datasets.
我永远不会推荐使用SELECT DISTINCT
,它在大型数据集上真的很慢。
Try using things like EXISTS
.
尝试使用诸如EXISTS
.
回答by transistor1
If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.
如果您说您的监视列表表是唯一的,那么其他表中的一个(或两个)要么 (a) 有重复项,要么 (b) 根据您使用的键不是唯一的。
To suppress duplicates in your results, either use DISTINCT as @Laykes says, or try
要抑制结果中的重复项,请按照@Laykes 所说使用 DISTINCT,或者尝试
GROUP BY watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.
不过,这听起来好像您希望所有 3 个表的键都是唯一的。如果是这种情况,您只是通过尝试检索不同的值来掩盖 SQL 的一些其他问题。
回答by Layke
You are grouping by watch.watch_id
and you have two results, which have different watch IDs, so naturally they would not be grouped.
你是分组依据watch.watch_id
,你有两个结果,它们有不同的手表ID,所以自然不会分组。
Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.
此外,从显示的结果来看,他们有不同的记录。这看起来是一个完全有效的预期结果。如果您只想选择不同的值,那么您不需要 GROUP,而是希望按不同的值进行选择。
SELECT DISTINCT()
...
SELECT DISTINCT()
...