SQL 在 PostgreSQL 中索引空值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3467982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 07:11:35  来源:igfitidea点击:

Indexing Null Values in PostgreSQL

sqldatabasepostgresqlindexing

提问by Cerin

I have a query of the form:

我有一个表格查询:

select m.id from mytable m
left outer join othertable o on o.m_id = m.id
    and o.col1 is not null and o.col2 is not null and o.col3 is not null
where o.id is null

The query returns a few hundred records, although the tables have millions of rows, and it takes forever to run (around an hour).

该查询返回几百条记录,尽管这些表有数百万行,并且需要永远运行(大约一个小时)。

When I check my index statistics using:

当我使用以下方法检查索引统计信息时:

select * from pg_stat_all_indexes
where schemaname <> 'pg_catalog' and (indexrelname like 'othertable_%' or indexrelname like 'mytable_%')

I see that only the index for othertable.m_id is being used, and that the indexes for col1..3 are not being used at all. Why is this?

我看到只使用了 othertable.m_id 的索引,而根本没有使用 col1..3 的索引。为什么是这样?

I've read in a fewplacesthat PG has traditionally not been able to index NULL values. However, I've read this has supposedly changed since PG 8.3? I'm currently using PostgreSQL 8.4 on Ubuntu 10.04. Do I need to make a "partial" or "functional" index specifically to speed up IS NOT NULL queries, or is it already indexing NULLs and I'm just misunderstanding the problem?

我在一些地方读到PG 传统上无法索引 NULL 值。但是,我读过这应该自 PG 8.3 以来发生了变化?我目前在 Ubuntu 10.04 上使用 PostgreSQL 8.4。我是否需要专门创建一个“部分”或“功能”索引来加速 IS NOT NULL 查询,或者它是否已经索引了 NULL 而我只是误解了这个问题?

回答by Matthew Wood

You could try a partial index:

您可以尝试部分索引:

CREATE INDEX idx_partial ON othertable (m_id)
WHERE (col1 is not null and col2 is not null and col3 is not null);

From the docs: http://www.postgresql.org/docs/current/interactive/indexes-partial.html

来自文档:http: //www.postgresql.org/docs/current/interactive/indexes-partial.html

回答by Scott Bailey

Partial indexes aren't going to help you here as they'll only find the records you don't want. You want to create an index that contains the records you do want.

部分索引在这里对您没有帮助,因为它们只会找到您不想要的记录。您想要创建一个包含您想要的记录的索引。

CREATE INDEX findDaNulls ON othertable ((COALESCE(col1,col2,col3,'Empty')))
WHERE col1 IS NULL AND col2 IS NULL AND col3 IS NULL;

SELECT * 
FROM mytable m
JOIN othertable o ON m.id = o.m_id
WHERE COALESCE(col1,col2,col3,'Empty') = 'Empty';

BTW searching for null left joins generally isn't as fast as using EXISTS or NOT EXISTS in Postgres.

顺便说一句,搜索空左连接通常不如在 Postgres 中使用 EXISTS 或 NOT EXISTS 快。

回答by Frank Heikens

A single index on m_id, col1, col2 and o.col3 would be my first thought for this query.

m_id、col1、col2 和 o.col3 上的单个索引将是我对这个查询的第一个想法。

And use EXPLAINon this query to see how it is executed and what takes so much time. You could show us the results to help you out.

并在此查询上使用EXPLAIN以查看它是如何执行的以及花费了多少时间。您可以向我们展示结果以帮助您解决问题。

回答by leonbloy

A partial indexseems the right way here:

一个部分指数在这里似乎是正确的方法:

If you have a table that contains both billed and unbilled orders, where the unbilled orders take up a small fraction of the total table and yet those are the most-accessed rows, you can improve performance by creating an index on just the unbilled rows.

如果您有一个同时包含已开票和未开单订单的表,其中未开单订单占整个表的一小部分,但这些是访问量最大的行,则可以通过仅在未开单行上创建索引来提高性能。

Perhaps those nullable columns (col1,col2,col3) act in your scenario as some kind of flag to distinguish some subclass of records in your table? (for example, some sort of "logical deletion") ? In that case, besides the partial index solution, you might prefer to rethink your design, and put them in different physical tables (perhaps using inheritance), one for the "live records" other for the "historical records" and access the full set (only when needed) thrugh a view.

也许那些可为空的列 (col1,col2,col3) 在您的场景中作为某种标志来区分表中的某些记录子类?(例如,某种“逻辑删除”)?在这种情况下,除了部分索引解决方案,您可能更愿意重新考虑您的设计,并将它们放在不同的物理表中(可能使用继承),一个用于“实时记录”,另一个用于“历史记录”并访问完整集合(仅在需要时)通过视图。

回答by a_horse_with_no_name

Did you try to create a combined index on othertable(m_id, col1, col2, col3)?

您是否尝试在 othertable(m_id, col1, col2, col3) 上创建组合索引?

You should also check the execution plan (using EXPLAIN) rather than checking the system tables for the index usage.

您还应该检查执行计划(使用 EXPLAIN)而不是检查系统表的索引使用情况。

PostgreSQL 9.0 (currently in beta) will be able to use and index for a IS NULL condition. That feature got postponed

PostgreSQL 9.0(目前处于测试阶段)将能够使用 IS NULL 条件并为其编制索引。该功能被推迟