SQL 如何使用postgres在连接表中仅加入一行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24042359/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:00:12  来源:igfitidea点击:

How to join only one row in joined table with postgres?

sqlpostgresqljoin

提问by Benjamin Crouzier

I have the following schema:

我有以下架构:

CREATE TABLE author (
    id   integer
  , name varchar(255)
);
CREATE TABLE book (
    id        integer
  , author_id integer
  , title     varchar(255)
  , rating    integer
);

And I want each author with its last book:

我希望每个作者都有最后一本书:

SELECT book.id, author.id, author.name, book.title as last_book
FROM author
JOIN book book ON book.author_id = author.id

GROUP BY author.id
ORDER BY book.id ASC

Apparently you can do that in mysql: Join two tables in MySQL, returning just one row from the second table.

显然你可以在 mysql 中做到这一点:在 MySQL 中加入两个表,从第二个表中只返回一行

But postgres gives this error:

但是 postgres 给出了这个错误:

ERROR: column "book.id" must appear in the GROUP BY clause or be used in an aggregate function: SELECT book.id, author.id, author.name, book.title as last_book FROM author JOIN book book ON book.author_id = author.id GROUP BY author.id ORDER BY book.id ASC

错误:列“book.id”必须出现在 GROUP BY 子句中或用于聚合函数:SELECT book.id, author.id, author.name, book.title as last_book FROM author JOIN book book ON book.author_id = author.id GROUP BY author.id ORDER BY book.id ASC

It's because:

这是因为

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.

当存在 GROUP BY 时,SELECT 列表表达式引用未分组的列是无效的,除非在聚合函数内,因为对于未分组的列将有多个可能的值返回。

How can I specify to postgres: "Give me only the last row, when ordered by joined_table.id, in the joined table ?"

我如何指定 postgres:“只给我最后一行,当排序时joined_table.id,在连接表中?”



Edit: With this data:

编辑:使用此数据:

INSERT INTO author (id, name) VALUES
  (1, 'Bob')
, (2, 'David')
, (3, 'John');

INSERT INTO book (id, author_id, title, rating) VALUES
  (1, 1, '1st book from bob', 5)
, (2, 1, '2nd book from bob', 6)
, (3, 1, '3rd book from bob', 7)
, (4, 2, '1st book from David', 6)
, (5, 2, '2nd book from David', 6);

I should see:

我应该看到:

book_id author_id name    last_book
3       1         "Bob"   "3rd book from bob"
5       2         "David" "2nd book from David"

回答by Clodoaldo Neto

select distinct on (author.id)
    book.id, author.id, author.name, book.title as last_book
from
    author
    inner join
    book on book.author_id = author.id
order by author.id, book.id desc

Check distinct on

查看 distinct on

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

SELECT DISTINCT ON ( expression [, ...] ) 仅保留给定表达式计算结果相等的每组行的第一行。DISTINCT ON 表达式使用与 ORDER BY 相同的规则进行解释(见上文)。请注意,每个集合的“第一行”是不可预测的,除非使用 ORDER BY 来确保所需的行首先出现。

With distinct on it is necessary to include the "distinct" columns in the order by. If that is not the order you want then you need to wrap the query and reorder

使用 distinct 时,必须在order by. 如果这不是您想要的顺序,那么您需要包装查询并重新排序

select 
    *
from (
    select distinct on (author.id)
        book.id, author.id, author.name, book.title as last_book
    from
        author
        inner join
        book on book.author_id = author.id
    order by author.id, book.id desc
) authors_with_first_book
order by authors_with_first_book.name

Another solution is to use a window function as in Lennart's answer. And another very generic one is this

另一种解决方案是使用 Lennart 答案中的窗口函数。另一个非常通用的就是这个

select 
    book.id, author.id, author.name, book.title as last_book
from
    book
    inner join
    (
        select author.id as author_id, max(book.id) as book_id
        from
            author
            inner join
            book on author.id = book.author_id
        group by author.id
    ) s
    on s.book_id = book.id
    inner join
    author on book.author_id = author.id

回答by wildplasser

This may look archaic and overly simple, but it does not depend on window functions, CTE's and aggregating subqueries. In most cases it is also the fastest.

这可能看起来过时且过于简单,但它不依赖于窗口函数、CTE 和聚合子查询。在大多数情况下,它也是最快的。

SELECT bk.id, au.id, au.name, bk.title as last_book
FROM author au
JOIN book bk ON bk.author_id = au.id
WHERE NOT EXISTS (
    SELECT *
    FROM book nx
    WHERE nx.author_id = bk.author_id
    AND nx.book_id > bk.book_id
    )
ORDER BY book.id ASC
    ;

回答by Lennart

Here is one way:

这是一种方法:

SELECT book_id, author_id, author_name, last_book
FROM (
    SELECT b.id as book_id
         , a.id as author_id
         , a.name as author_name
         , b.title as last_book
         , row_number() over (partition by a.id
                              order by b.id desc) as rn
    FROM author a
    JOIN book b 
        ON b.author_id = a.id
) last_books
WHERE rn = 1;

回答by Mirthe

I've done something similar for a chat system, where room holds the metadata and list contains the messages. I ended up using the Postgresql LATERAL JOIN which worked like a charm.

我为聊天系统做了类似的事情,其中​​房间保存元数据,列表包含消息。我最终使用了 Postgresql LATERAL JOIN,它就像一个魅力。

SELECT MR.id AS room_id, MR.created_at AS room_created, 
    lastmess.content as lastmessage_content, lastmess.datetime as lastmessage_when
FROM message.room MR
    LEFT JOIN LATERAL (
        SELECT content, datetime
        FROM message.list
        WHERE room_id = MR.id
        ORDER BY datetime DESC 
        LIMIT 1) lastmess ON true
ORDER BY lastmessage_when DESC NULLS LAST, MR.created_at DESC

For more info see https://heap.io/blog/engineering/postgresqls-powerful-new-join-type-lateral

有关更多信息,请参阅https://heap.io/blog/engineering/postgresqls-powerful-new-join-type-lateral

回答by Taochok

You could add a rule into the join for specifying only one row. I had work for me.

您可以将规则添加到联接中以仅指定一行。我有工作。

Like this:

像这样:

SELECT 
    book.id, 
    author.id, 
    author.name, 
    book.title as last_book
FROM author auth1
JOIN book book ON (book.author_id = auth1.id AND book.id = (select max(b.id) from book b where b.author_id = auth1))
GROUP BY auth1.id
ORDER BY book.id ASC

This way you get the data from the book with the higher ID. You could add "date" and make the same with the max(date).

通过这种方式,您可以从具有更高 ID 的书中获取数据。您可以添加“日期”并与 max(date) 相同。

回答by jobermark

As a slight variation on @wildplasser's suggestion, which still works across implementations, you can use max rather than not exists. This reads better if you like short joins better than long where clauses

作为@wildplasser 建议的一个细微变化,它仍然适用于各种实现,您可以使用 max 而不是不存在。如果你更喜欢短连接而不是长 where 子句,这会更好读

select * 
  from author au
  join (
    select max(id) as max_id, author_id
      from book bk
     group by author_id) as lb 
    on lb.author_id = au.id
  join bk 
    on bk.id = lb.max_id;

or, to give a name to the subquery, which clarifies things, go with WITH

或者,要为子查询命名以澄清问题,请使用 WITH

with last_book as 
   (select max(id) as max_id, author_id
      from book bk
     group by author_id)

select * 
  from author au
  join last_book lb
    on au.id = lb.author_id
  join bk 
    on bk.id = lb.max_id;

回答by Bobburi Madhu

create temp table book_1 as (
SELECT
id
,title
,author_id
,row_number() OVER (PARTITION BY id) as rownum 
FROM
book)  distributed by ( id );

select author.id,b.id, author.id, author.name, b.title as last_book
from
    author

    left  join
   (select * from  book_1 where rownum = 1 ) b on b.author_id = author.id
order by author.id, b.id desc