MySQL 消除 PostgreSQL SELECT 语句中的重复行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8373087/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 21:56:13  来源:igfitidea点击:

Eliminate duplicate rows in a PostgreSQL SELECT statement

mysqlsqlpostgresqlselectduplicates

提问by The Surrican

This is my query:

这是我的查询:

SELECT autor.entwickler,anwendung.name
  FROM autor 
  left join anwendung
    on anwendung.name = autor.anwendung;

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4
 Benutzer 2 | Anwendung 4
(6 rows)

I want to keep one row for each distinct value in the field name, and discard the others like this:

我想为 field 中的每个不同值保留一行name,并像这样丢弃其他值:

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4

In MySQL I would just do:

在 MySQL 中,我会这样做:

SELECT autor.entwickler,anwendung.name
  FROM autor
  left join anwendung
    on anwendung.name = autor.anwendung
 GROUP BY anwendung.name;

But PostgreSQL gives me this error:

但是 PostgreSQL 给了我这个错误:

ERROR: column "autor.entwickler" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT autor.entwickler FROM autor left join anwendung on an ...

错误:列“autor.entwickler”必须出现在 GROUP BY 子句中或用于聚合函数第 1 行:SELECT autor.entwickler FROM autor left join anwendung on an ...

I totally understand the error and assume that the mysql implementation is less SQL conform than the postgres implementation. But how can I get the desired result?

我完全理解这个错误,并假设 mysql 实现比 postgres 实现更不符合 SQL。但是我怎样才能得到想要的结果呢?

回答by Craig Ringer

PostgreSQL doesn't currently allow ambiguous GROUP BYstatements where the results are dependent on the order the table is scanned, the plan used, etc. That's how the standard says it should work AFAIK, but some databases (like MySQL versions prior to 5.7) permit looser queries that just pick the first value encountered for elements appearing in the SELECTlist but not in GROUP BY.

PostgreSQL 目前不允许含糊不清的GROUP BY语句,其中结果取决于扫描表的顺序、使用的计划等。这就是标准所说的它应该可以工作的方式 AFAIK,但某些数据库(如 MySQL 5.7 之前的版本)允许更宽松只为出现在SELECT列表中但不在GROUP BY.

In PostgreSQL, you should use DISTINCT ONfor this kind of query.

在 PostgreSQL 中,您应该使用DISTINCT ON这种查询。

You want to write something like:

你想写这样的东西:

SELECT DISTINCT ON (anwendung.name) anwendung.name, autor.entwickler
FROM author 
left join anwendung on anwendung.name = autor.anwendung;

(Syntax corrected based on follow-up comment)

(根据后续评论更正语法)

This is a bit like MySQL 5.7's ANY_VALUE(...)pseudo-function for group by, but in reverse - it says that the values in the distinct onclause must be unique, and any value is acceptable for the columns notspecified.

这有点像 MySQL 5.7 的ANY_VALUE(...)伪函数 for group by,但反过来 - 它说distinct on子句中的值必须是唯一的,对于指定的列,任何值都是可以接受的。

Unless there's an ORDER BY, there is no gurantee as to which values are selected. You should usually have an ORDER BYfor predictability.

除非有ORDER BY,否则无法保证选择哪些值。你通常应该有一个ORDER BY可预测性。

It's also been noted that using an aggregate like min()or max()would work. While this is true - and will lead to reliable and predictable results, unlike using DISTINCT ONor an ambigious GROUP BY- it has a performance cost due to the need for extra sorting or aggregation, and it only works for ordinal data types.

还注意到使用像min()max()这样的聚合会起作用。虽然这是真的 - 并且会导致可靠和可预测的结果,不像使用DISTINCT ON或模糊GROUP BY- 由于需要额外的排序或聚合,它具有性能成本,并且它仅适用于序数数据类型。

回答by Erwin Brandstetter

Craig's answer and your resulting query in the comments share the same flaw: The table anwendungis at the right side of a LEFT JOIN, which contradicts your obvious intent. You care about anwendung.nameand pick autor.entwicklerarbitrarily. I'll come back to that further down.

克雷格的回答和您在评论中产生的查询有相同的缺陷:表格anwendung位于 a 的右侧LEFT JOIN,这与您的明显意图相矛盾。你关心anwendung.nameautor.entwickler随意挑选。我会回到更远的地方。

It should be:

它应该是:

SELECT DISTINCT ON (1) an.name, au.entwickler
FROM   anwendung an
LEFT   JOIN autor au ON an.name = au.anwendung;

DISTINCT ON (1)is just a syntactical shorthand for DISTINCT ON (an.name). Positional references are allowed here.

DISTINCT ON (1)只是DISTINCT ON (an.name). 此处允许位置引用。

If there are multiple developers (entwickler) for an app (anwendung) one developer is picked arbitrarily. You have to add an ORDER BYclause if you want the "first" (alphabetically according to your locale):

如果entwickler一个应用程序 ( anwendung)有多个开发人员 ( ),则任意挑选一个开发人员。ORDER BY如果您想要“第一个”(根据您的语言环境按字母顺序排列),则必须添加一个子句:

SELECT DISTINCT ON (1) an.name, au.entwickler
FROM   anwendung an
LEFT   JOIN autor au ON an.name = au.anwendung
ORDER  BY 1, 2;

As @mdahlman implied, a more canonical way would be:

正如@mdahlman 暗示的那样,更规范的方式是:

SELECT an.name, min(au.entwickler) AS entwickler
FROM   autor au
LEFT   JOIN anwendung an ON an.name = au.anwendung
GROUP  BY an.name;


Or, better yet, clean up your data model, implement the n:m relationshipbetween anwendungand autorproperly, add surrogate primary keys as anwendungand autorare hardly unique, enforce relational integrity with foreign key constraints and adapt your resulting query:

或者,更好的是,清理你的数据模型,实现多对多关系之间anwendungautor正确,加代理主键作为anwendungautor并不独特,具有强制执行外键约束关系完整性和适应你得到的查询:

The proper way

正确的方法

CREATE TABLE autor (
   autor_id serial PRIMARY KEY -- surrogate primary key
 , autor    text NOT NULL);

INSERT INTO autor  VALUES
   (1, 'mike')
 , (2, 'joe')
 , (3, 'jane')   -- worked on two apps
 , (4, 'susi');  -- has no part in any apps (yet)

CREATE TABLE anwendung (
   anwendung_id serial PRIMARY KEY -- surrogate primary key
 , anwendung    text  UNIQUE);     -- disallow duplicate names

INSERT INTO anwendung  VALUES
   (1, 'foo')    -- has 3 authors linked to it
 , (2, 'bar')
 , (3, 'shark')
 , (4, 'bait');  -- has no authors attached to it (yet).

CREATE TABLE autor_anwendung (  -- you might name this table "entwickler"
   autor_id     integer REFERENCES autor     ON UPDATE CASCADE ON DELETE CASCADE
 , anwendung_id integer REFERENCES anwendung ON UPDATE CASCADE ON DELETE CASCADE
 , PRIMARY KEY (autor_id, anwendung_id)
);

INSERT INTO autor_anwendung VALUES
 (1, 1)
,(2, 1)
,(3, 1)
,(2, 2)
,(3, 3);

This query retrieves one row per app with one associated author (the 1st one alphabetically) or NULL if there are none:

此查询为每个应用检索一行与一位关联作者(按字母顺序排列的第一个),如果没有,则为 NULL:

SELECT DISTINCT ON (1) an.anwendung, au.autor
FROM   anwendung an
LEFT   JOIN autor_anwendung au_au USING (anwendung_id)
LEFT   JOIN autor au USING (autor_id)
ORDER  BY 1, 2;

Result:

结果:

 name  | entwickler
-------+-----------------
 bait  |
 bar   | joe
 foo   | jane
 shark | jane