postgresql 加速查询,一个大表和一个小表的简单内连接

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3007149/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-20 00:09:39  来源:igfitidea点击:

Speed up a query, simple inner join with one large table and one small table

sqlpostgresqljoinperformance

提问by Claudiu

I have a table T1 with 60 rows and 5 columns: ID1, ID2, info1, info2, info3.

我有一个表 T1,有 60 行和 5 列:ID1、ID2、info1、info2、info3。

I have a table T2 with 1.2 million rows and another 5 columns: ID3, ID2, info4, info5, info6.

我有一个包含 120 万行和另外 5 列的表 T2:ID3、ID2、info4、info5、info6。

I want to get (ID1, ID2, info4, info5, info6) from all the rows where the ID2s match up. Currently my query looks like this:

我想从 ID2 匹配的所有行中获取 (ID1, ID2, info4, info5, info6)。目前我的查询是这样的:

SELECT T1.ID1, T2.ID2,
       T2.info4, T2.info5, T2.info6
  FROM T1, T2
 WHERE T1.ID2 = T2.ID2;

This takes about 15 seconds to run. My question is - should it take that long, and if not, how can I speed it up? I figure it shouldn't since T1 is so small.

这大约需要 15 秒才能运行。我的问题是 - 应该花那么长时间,如果没有,我该如何加快速度?我认为不应该,因为 T1 太小了。

I asked PostgreSQL to EXPLAIN the query, and it says that it hashes T2, then hash joins that hash with T1. It seems hashing T2 is what takes so long. Is there any way to write the query so it doesn't have to hash T2? Or, is there a way to have it cache the hash of T2 so it doesn't re-do it? The tables will only be updated every few days.

我让 PostgreSQL 解释查询,它说它对 T2 进行哈希处理,然后将哈希与 T1 连接起来。似乎散列 T2 需要这么长时间。有什么方法可以编写查询,以便不必对 T2 进行哈希处理?或者,有没有办法让它缓存 T2 的哈希值,这样它就不会重做?这些表格只会每隔几天更新一次。

If it makes a difference, T1 is a temporary table created earlier in the session.

如果有所不同,则 T1 是在会话中较早创建的临时表。

回答by Peter Lang

It should not take that long :)

不应该花那么长时间:)

Creating an index on T2( ID2 )should improve the performance of your query:

创建索引T2( ID2 )应该可以提高查询的性能:

CREATE INDEX idx_t2_id2 ON t2 (id2);

回答by Pavel Belousov

May be using JOIN increase speed of query:

可能是使用 JOIN 提高查询速度:

SELECT T1.ID1, T2.ID2,
    T2.info4, T2.info5, T2.info6
FROM T1
JOIN T2 ON T2.ID2 = T1.ID2;

I don't know exactly but may be your query firstly join all row in both table, and after that apply WHERE conditions and it's problem.

我不知道确切但可能是您的查询首先加入两个表中的所有行,然后应用 WHERE 条件,这是问题。

And of course, as Peter Lang saw, you should create index.

当然,正如 Peter Lang 所见,您应该创建索引。

回答by user347594

First, a make a join.

首先,进行连接。

SELECT T1.ID1, T2.ID2,
       T2.info4, T2.info5, T2.info6
  FROM T1
  JOIN T2 ON T1.ID2 = T2.ID2;

Then try creating and index on T2.d2.

然后尝试在 T2.d2 上创建和索引。

If not, if possible, you can add ID1 column to T2. Update it accordingly every few days as you claim. Then it just a simple query on T2 with no joins.

如果没有,如果可能,您可以将 ID1 列添加到 T2。根据您的要求,每隔几天相应地更新一次。然后它只是对 T2 的简单查询,没有连接。

SELECT T2.ID1, T2.ID2,
       T2.info4, T2.info5, T2.info6
  FROM T2 
  WHERE T2.ID2 = A_VALUE;

Again, an index on T2.ID2 will be recommended.

同样,将推荐 T2.ID2 上的索引。