如何在选择子句中执行 Postgresql 子查询,并在 SQL Server 等从子句中加入?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3004887/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 06:29:27  来源:igfitidea点击:

How to do a Postgresql subquery in select clause with join in from clause like SQL Server?

sqlsql-serverpostgresqlsubquery

提问by Ricardo

I am trying to write the following query on postgresql:

我正在尝试在 postgresql 上编写以下查询:

select name, author_id, count(1), 
    (select count(1)
    from names as n2
    where n2.id = n1.id
        and t2.author_id = t1.author_id
    )               
from names as n1
group by name, author_id

This would certainly work on Microsoft SQL Server but it does not at all on postegresql. I read its documentation a bit and it seems I could rewrite it as:

这当然可以在 Microsoft SQL Server 上运行,但在 postegresql 上根本不起作用。我阅读了它的文档,似乎可以将其重写为:

select name, author_id, count(1), total                     
from names as n1, (select count(1) as total
    from names as n2
    where n2.id = n1.id
        and n2.author_id = t1.author_id
    ) as total
group by name, author_id

But that returns the following error on postegresql: "subquery in FROM cannot refer to other relations of same query level". So I'm stuck. Does anyone know how I can achieve that?

但这会在 postegresql 上返回以下错误:“FROM 中的子查询不能引用相同查询级别的其他关系”。所以我被困住了。有谁知道我怎么能做到这一点?

Thanks

谢谢

回答by Bob Jarvis - Reinstate Monica

I'm not sure I understand your intent perfectly, but perhaps the following would be close to what you want:

我不确定我是否完全理解您的意图,但也许以下内容与您想要的很接近:

select n1.name, n1.author_id, count_1, total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
              from names
              group by id, author_id) n2
  on (n2.id = n1.id and n2.author_id = n1.author_id)

Unfortunately this adds the requirement of grouping the first subquery by id as well as name and author_id, which I don't think was wanted. I'm not sure how to work around that, though, as you need to have id available to join in the second subquery. Perhaps someone else will come up with a better solution.

不幸的是,这增加了按 id 以及 name 和 author_id 对第一个子查询进行分组的要求,我认为这是不需要的。不过,我不确定如何解决这个问题,因为您需要有可用的 id 才能加入第二个子查询。也许其他人会想出更好的解决方案。

Share and enjoy.

分享和享受。

回答by Ricardo

I am just answering here with the formatted version of the final sql I needed based on Bob Jarvis answer as posted in my comment above:

我只是根据我上面评论中发布的 Bob Jarvis 回答,使用我需要的最终 sql 的格式化版本来回答:

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
              from names
              group by author_id) n2
  on (n2.author_id = n1.author_id)

回答by deFreitas

Complementing @Bob Jarvisand @dmikamanswer, Postgres don't perform a good plan when you don't use LATERAL, below a simulation, in both cases the query data results are the same, but the cost are very different

补充@Bob Jarvis@dmikam 的回答,Postgres 在不使用 LATERAL 的情况下不会执行好的计划,在模拟下,两种情况下查询数据结果是相同的,但是成本却大不相同

Table structure

表结构

CREATE TABLE ITEMS (
    N INTEGER NOT NULL,
    S TEXT NOT NULL
);

INSERT INTO ITEMS
  SELECT
    (random()*1000000)::integer AS n,
    md5(random()::text) AS s
  FROM
    generate_series(1,1000000);

CREATE INDEX N_INDEX ON ITEMS(N);

Performing JOINwith GROUP BYin subquery without LATERAL

执行JOINGROUP BY子查询无LATERAL

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);

The results

结果

Merge Join  (cost=0.87..637500.40 rows=23 width=37)
  Merge Cond: (i.n = items.n)
  ->  Index Scan using n_index on items i  (cost=0.43..101.28 rows=23 width=37)
        Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..626631.11 rows=861418 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..593016.93 rows=10000000 width=4)

Using LATERAL

使用 LATERAL

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN LATERAL (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    WHERE N = I.N
    GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);

Results

结果

Nested Loop  (cost=9.49..1319.97 rows=276 width=37)
  ->  Bitmap Heap Scan on items i  (cost=9.06..100.20 rows=23 width=37)
        Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
        ->  Bitmap Index Scan on n_index  (cost=0.00..9.05 rows=23 width=0)
              Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..52.79 rows=12 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..52.64 rows=12 width=4)
              Index Cond: (n = i.n)

My Postgres version is PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)

我的 Postgres 版本是 PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)

回答by dmikam

I know this is old, but since Postgresql 9.3there is an option to use a keyword "LATERAL" to use RELATED subqueries inside of JOINS, so the query from the question would look like:

我知道这很旧,但是从Postgresql 9.3 开始,可以选择使用关键字“LATERAL”在 JOINS 内使用 RELATED 子查询,因此问题中的查询如下所示:

SELECT 
    name, author_id, count(*), t.total
FROM
    names as n1
    INNER JOIN LATERAL (
        SELECT 
            count(*) as total
        FROM 
            names as n2
        WHERE 
            n2.id = n1.id
            AND n2.author_id = n1.author_id
    ) as t ON 1=1
GROUP BY 
    n1.name, n1.author_id

回答by Zahid Gani

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select distinct(author_id), count(1) as total_count
              from names) n2
  on (n2.author_id = n1.author_id)
Where true

used distinctif more inner join, because more join group performance is slow

使用distinct如果有更多的内部连接,因为越来越多的加入组性能很慢