postgresql 如果引用存在则连接表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16748167/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Joining tables if the reference exists
提问by wiizzard
I got a PostgreSQL database with 4 tables:
我有一个带有 4 个表的 PostgreSQL 数据库:
Table A
表A
---------------------------
| ID | B_ID | C_ID | D_ID |
---------------------------
| 1 | 1 | NULL | NULL |
---------------------------
| 2 | NULL | 1 | NULL |
---------------------------
| 3 | 2 | 2 | 1 |
---------------------------
| 4 | NULL | NULL | 2 |
---------------------------
Table B
表B
-------------
| ID | DATA |
-------------
| 1 | 123 |
-------------
| 2 | 456 |
-------------
Table C
表C
-------------
| ID | DATA |
-------------
| 1 | 789 |
-------------
| 2 | 102 |
-------------
Table D
表 D
-------------
| ID | DATA |
-------------
| 1 | 654 |
-------------
| 2 | 321 |
-------------
I'm trying to retrieve a result set which has joined the data from table B and the data from table C, only if one of booth IDs is not null.
我正在尝试检索一个结果集,该结果集已连接表 B 中的数据和表 C 中的数据,仅当一个展位 ID 不为空时。
SELECT "Table_A"."ID", "Table_A"."ID_B", "Table_A"."ID_C", "Table_A"."ID_D", "Table_B"."DATA", "Table_C"."DATA"
FROM "Table_A"
LEFT JOIN "Table_B" on "Table_A"."ID_B" = "Table_B"."ID"
LEFT JOIN "Table_C" on "Table_A"."ID_C" = "Table_C"."ID"
WHERE "Table_A"."ID_B" IS NOT NULL OR "Table_A"."ID_C" IS NOT NULL;
Is this recommended or should I better split this in multiple queries?
这是推荐的还是我应该更好地将其拆分为多个查询?
Is there a way to do an inner join between these tables?
有没有办法在这些表之间进行内部连接?
The result I expect is:
我期望的结果是:
-------------------------------------------------
| ID | ID_B | ID_C | ID_D | DATA (B) | DATA (C) |
-------------------------------------------------
| 1 | 1 | NULL | NULL | 123 | NULL |
-------------------------------------------------
| 2 | NULL | 1 | NULL | NULL | 789 |
-------------------------------------------------
| 3 | 2 | 2 | NULL | 456 | 102 |
-------------------------------------------------
EDIT:ID_B
, ID_C
, ID_D
are foreign keys to the tables table_b
, table_c
, table_d
编辑:ID_B
, ID_C
,ID_D
是表的外键table_b
, table_c
,table_d
采纳答案by wildplasser
The WHERE "Table_A"."ID_B" IS NOT NULL OR "Table_A"."ID_C" IS NOT NULL;
can be replaced by the corresponding clause on the B and C tables : WHERE "Table_B"."ID" IS NOT NULL OR "Table_C"."ID" IS NOT NULL;
. This would also work if table_a.id_b and table_a.id_c are not FKs to the B and C tables. Otherwise, a table_a row with { 5, 5,5,5} would retrieve two NULL rows from the B and C tables.
在WHERE "Table_A"."ID_B" IS NOT NULL OR "Table_A"."ID_C" IS NOT NULL;
可以通过在B和C表对应的条款所取代:WHERE "Table_B"."ID" IS NOT NULL OR "Table_C"."ID" IS NOT NULL;
。如果 table_a.id_b 和 table_a.id_c 不是 B 和 C 表的 FK,这也将起作用。否则,带有 { 5, 5,5,5} 的 table_a 行将从 B 和 C 表中检索两个 NULL 行。
SELECT ta."ID" AS a_id
, ta."ID_B" AS b_id
, ta."ID_C" AS c_id
, ta."ID_D" AS d_id
, tb."DATA" AS bdata
, tc."DATA" AS cdata
FROM "Table_a" ta
LEFT JOIN "Table_B" tb on ta."ID_B" = tb."ID"
LEFT JOIN "Table_C" tc on ta."ID_C" = tc."ID"
WHERE tb."ID" IS NOT NULL OR tc."ID" IS NOT NULL
;
回答by Lukas Eder
Given your requirements, your query seems good to me.
鉴于您的要求,您的查询对我来说似乎很好。
An alternative would be to use nested selects in the projection, but depending on your data, indexes and constraints, that might be slower, as nested selects usually result in nested loops, whereas joins can be performed as merge joins or nested loops:
另一种方法是在投影中使用嵌套选择,但根据您的数据、索引和约束,这可能会更慢,因为嵌套选择通常会导致嵌套循环,而连接可以作为合并连接或嵌套循环执行:
SELECT
"Table_A"."ID",
"Table_A"."ID_B",
"Table_A"."ID_C",
"Table_A"."ID_D",
(SELECT "DATA" FROM "Table_B" WHERE "Table_A"."ID_B" = "Table_B"."ID"),
(SELECT "DATA" FROM "Table_C" WHERE "Table_A"."ID_C" = "Table_C"."ID")
FROM "Table_A"
WHERE "Table_A"."ID_B" IS NOT NULL OR "Table_A"."ID_C" IS NOT NULL;
If Postgres does scalar subquery caching(as Oracle does), then nested selects might help in case you have a lot of data repetition in Table_A
如果 Postgres 使用标量子查询缓存(如 Oracle 那样),那么嵌套选择可能会在您有大量数据重复的情况下有所帮助Table_A
回答by Erwin Brandstetter
Since you have foreign key constraints in place, referential integrity is guaranteed and the query in your Q is already the best answer.
由于您有外键约束,参照完整性得到保证,您的 Q 中的查询已经是最佳答案。
Also indexes on Table_B.ID
and Table_C.ID
are given.
还给出了Table_B.ID
和 的索引Table_C.ID
。
Ifmatching cases in Table_A
are rare(less than ~ 5 %, depending on row with and data distribution) a partial multi-column indexwould help performance:
如果匹配情况Table_A
是罕见的(小于约5%,这取决于行与和数据分布)的局部多列索引将有助于性能:
CREATE INDEX table_a_special_idx ON "Table_A" ("ID_B", "ID_C")
WHERE "ID_B" IS NOT NULL OR "ID_C" IS NOT NULL;
In PostgreSQL 9.2 a covering index (index-only scanin Postgres parlance) might help even more - in which case you would include all columns of interest in the index (not in my example). Depends on several factors like row width and frequency of updates in your table.
在 PostgreSQL 9.2 中,覆盖索引(Postgres 说法中的仅索引扫描)可能会提供更多帮助 - 在这种情况下,您将在索引中包含所有感兴趣的列(不在我的示例中)。取决于几个因素,例如表中的行宽和更新频率。
回答by Beryllium
Generally spealking the recommended way is to do it in one query only, and let the database do as much work as possible, especially if you add other operations like sorting (order by) or pagination later (limit ... offset ...) later. We have done some measurements, and there is no way to sort/paginate faster in Java/Scala, if you use any of the higher level collections like lists etc.
一般说来推荐的方法是只在一个查询中做,让数据库做尽可能多的工作,特别是如果你添加其他操作,比如排序(order by)或分页(limit ... offset ...)之后。我们已经进行了一些测量,如果您使用任何更高级别的集合(如列表等),则无法在 Java/Scala 中更快地排序/分页。
RDBMS deal very well with single complex statements, but they have difficulties in handling many small queries. For example, if you query the "one" and the "many relation" in one query, it will be faster than doing this in 1 + n select statements.
RDBMS 可以很好地处理单个复杂的语句,但它们在处理许多小查询时存在困难。例如,如果您在一个查询中查询“一个”和“多个关系”,这将比在 1 + n 个 select 语句中执行此操作要快。
As for the outer join, we have done measurements, and there is no real performance penalty compared with inner joins. So if your data model and/or your query require an outer join, just do it. If it was a performance problem, you can tune it later.
至于外连接,我们已经做了测量,与内连接相比没有真正的性能损失。因此,如果您的数据模型和/或您的查询需要外连接,那就去做吧。如果是性能问题,您可以稍后对其进行调整。
As for your null comparisons, it mightindicate that your data model could be optimized, but that is just a guess. Chances are that you can improve the design so that null is not allowed in these columns.
至于您的空值比较,它可能表明您的数据模型可以优化,但这只是一个猜测。您可能可以改进设计,以便在这些列中不允许使用 null。