MySQL 消除 PostgreSQL SELECT 语句中的重复行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8373087/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Eliminate duplicate rows in a PostgreSQL SELECT statement
提问by The Surrican
This is my query:
这是我的查询:
SELECT autor.entwickler,anwendung.name
FROM autor
left join anwendung
on anwendung.name = autor.anwendung;
entwickler | name
------------+-------------
Benutzer 1 | Anwendung 1
Benutzer 2 | Anwendung 1
Benutzer 2 | Anwendung 2
Benutzer 1 | Anwendung 3
Benutzer 1 | Anwendung 4
Benutzer 2 | Anwendung 4
(6 rows)
I want to keep one row for each distinct value in the field name
, and discard the others like this:
我想为 field 中的每个不同值保留一行name
,并像这样丢弃其他值:
entwickler | name
------------+-------------
Benutzer 1 | Anwendung 1
Benutzer 2 | Anwendung 2
Benutzer 1 | Anwendung 3
Benutzer 1 | Anwendung 4
In MySQL I would just do:
在 MySQL 中,我会这样做:
SELECT autor.entwickler,anwendung.name
FROM autor
left join anwendung
on anwendung.name = autor.anwendung
GROUP BY anwendung.name;
But PostgreSQL gives me this error:
但是 PostgreSQL 给了我这个错误:
ERROR: column "autor.entwickler" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT autor.entwickler FROM autor left join anwendung on an ...
错误:列“autor.entwickler”必须出现在 GROUP BY 子句中或用于聚合函数第 1 行:SELECT autor.entwickler FROM autor left join anwendung on an ...
I totally understand the error and assume that the mysql implementation is less SQL conform than the postgres implementation. But how can I get the desired result?
我完全理解这个错误,并假设 mysql 实现比 postgres 实现更不符合 SQL。但是我怎样才能得到想要的结果呢?
回答by Craig Ringer
PostgreSQL doesn't currently allow ambiguous GROUP BY
statements where the results are dependent on the order the table is scanned, the plan used, etc. That's how the standard says it should work AFAIK, but some databases (like MySQL versions prior to 5.7) permit looser queries that just pick the first value encountered for elements appearing in the SELECT
list but not in GROUP BY
.
PostgreSQL 目前不允许含糊不清的GROUP BY
语句,其中结果取决于扫描表的顺序、使用的计划等。这就是标准所说的它应该可以工作的方式 AFAIK,但某些数据库(如 MySQL 5.7 之前的版本)允许更宽松只为出现在SELECT
列表中但不在GROUP BY
.
In PostgreSQL, you should use DISTINCT ON
for this kind of query.
在 PostgreSQL 中,您应该使用DISTINCT ON
这种查询。
You want to write something like:
你想写这样的东西:
SELECT DISTINCT ON (anwendung.name) anwendung.name, autor.entwickler
FROM author
left join anwendung on anwendung.name = autor.anwendung;
(Syntax corrected based on follow-up comment)
(根据后续评论更正语法)
This is a bit like MySQL 5.7's ANY_VALUE(...)
pseudo-function for group by
, but in reverse - it says that the values in the distinct on
clause must be unique, and any value is acceptable for the columns notspecified.
这有点像 MySQL 5.7 的ANY_VALUE(...)
伪函数 for group by
,但反过来 - 它说distinct on
子句中的值必须是唯一的,对于未指定的列,任何值都是可以接受的。
Unless there's an ORDER BY
, there is no gurantee as to which values are selected. You should usually have an ORDER BY
for predictability.
除非有ORDER BY
,否则无法保证选择哪些值。你通常应该有一个ORDER BY
可预测性。
It's also been noted that using an aggregate like min()
or max()
would work. While this is true - and will lead to reliable and predictable results, unlike using DISTINCT ON
or an ambigious GROUP BY
- it has a performance cost due to the need for extra sorting or aggregation, and it only works for ordinal data types.
还注意到使用像min()
或max()
这样的聚合会起作用。虽然这是真的 - 并且会导致可靠和可预测的结果,不像使用DISTINCT ON
或模糊GROUP BY
- 由于需要额外的排序或聚合,它具有性能成本,并且它仅适用于序数数据类型。
回答by Erwin Brandstetter
Craig's answer and your resulting query in the comments share the same flaw: The table anwendung
is at the right side of a LEFT JOIN
, which contradicts your obvious intent. You care about anwendung.name
and pick autor.entwickler
arbitrarily. I'll come back to that further down.
克雷格的回答和您在评论中产生的查询有相同的缺陷:表格anwendung
位于 a 的右侧LEFT JOIN
,这与您的明显意图相矛盾。你关心anwendung.name
和autor.entwickler
随意挑选。我会回到更远的地方。
It should be:
它应该是:
SELECT DISTINCT ON (1) an.name, au.entwickler
FROM anwendung an
LEFT JOIN autor au ON an.name = au.anwendung;
DISTINCT ON (1)
is just a syntactical shorthand for DISTINCT ON (an.name)
. Positional references are allowed here.
DISTINCT ON (1)
只是DISTINCT ON (an.name)
. 此处允许位置引用。
If there are multiple developers (entwickler
) for an app (anwendung
) one developer is picked arbitrarily. You have to add an ORDER BY
clause if you want the "first" (alphabetically according to your locale):
如果entwickler
一个应用程序 ( anwendung
)有多个开发人员 ( ),则任意挑选一个开发人员。ORDER BY
如果您想要“第一个”(根据您的语言环境按字母顺序排列),则必须添加一个子句:
SELECT DISTINCT ON (1) an.name, au.entwickler
FROM anwendung an
LEFT JOIN autor au ON an.name = au.anwendung
ORDER BY 1, 2;
As @mdahlman implied, a more canonical way would be:
正如@mdahlman 暗示的那样,更规范的方式是:
SELECT an.name, min(au.entwickler) AS entwickler
FROM autor au
LEFT JOIN anwendung an ON an.name = au.anwendung
GROUP BY an.name;
Or, better yet, clean up your data model, implement the n:m relationshipbetween anwendung
and autor
properly, add surrogate primary keys as anwendung
and autor
are hardly unique, enforce relational integrity with foreign key constraints and adapt your resulting query:
或者,更好的是,清理你的数据模型,实现多对多关系之间anwendung
和autor
正确,加代理主键作为anwendung
和autor
并不独特,具有强制执行外键约束关系完整性和适应你得到的查询:
The proper way
正确的方法
CREATE TABLE autor (
autor_id serial PRIMARY KEY -- surrogate primary key
, autor text NOT NULL);
INSERT INTO autor VALUES
(1, 'mike')
, (2, 'joe')
, (3, 'jane') -- worked on two apps
, (4, 'susi'); -- has no part in any apps (yet)
CREATE TABLE anwendung (
anwendung_id serial PRIMARY KEY -- surrogate primary key
, anwendung text UNIQUE); -- disallow duplicate names
INSERT INTO anwendung VALUES
(1, 'foo') -- has 3 authors linked to it
, (2, 'bar')
, (3, 'shark')
, (4, 'bait'); -- has no authors attached to it (yet).
CREATE TABLE autor_anwendung ( -- you might name this table "entwickler"
autor_id integer REFERENCES autor ON UPDATE CASCADE ON DELETE CASCADE
, anwendung_id integer REFERENCES anwendung ON UPDATE CASCADE ON DELETE CASCADE
, PRIMARY KEY (autor_id, anwendung_id)
);
INSERT INTO autor_anwendung VALUES
(1, 1)
,(2, 1)
,(3, 1)
,(2, 2)
,(3, 3);
This query retrieves one row per app with one associated author (the 1st one alphabetically) or NULL if there are none:
此查询为每个应用检索一行与一位关联作者(按字母顺序排列的第一个),如果没有,则为 NULL:
SELECT DISTINCT ON (1) an.anwendung, au.autor
FROM anwendung an
LEFT JOIN autor_anwendung au_au USING (anwendung_id)
LEFT JOIN autor au USING (autor_id)
ORDER BY 1, 2;
Result:
结果:
name | entwickler
-------+-----------------
bait |
bar | joe
foo | jane
shark | jane