postgresql 在 postgres 中索引日期字段的推荐方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40320877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Recommended way to index a date field in postgres?
提问by massphoenix
I have a few tables with about 17M rows that all have a date column I would like to be able to utilize frequently for searches. I am considering either just throwing an index on the column and see how things go or sorting the items by date as a one time operation and then inserting everything into a new table so that the primary key ascends as the date ascends.
我有一些包含大约 1700 万行的表,它们都有一个日期列,我希望能够经常用于搜索。我正在考虑要么只是在列上抛出一个索引并查看事情的进展情况,要么作为一次性操作按日期对项目进行排序,然后将所有内容插入新表中,以便主键随着日期的增加而增加。
Since these are both pretty time consuming I thought it might be worth it to ask here first for input.
由于这些都非常耗时,我认为首先在这里询问输入可能是值得的。
The end goal is for me to load sql queries into pandas for some analysis if that is relevant here.
最终目标是我将 sql 查询加载到 Pandas 中进行一些分析,如果这在这里相关的话。
采纳答案by klin
The index on a date column makes sense when you are going to search the table for a given date(s), e.g.:
当您要在表中搜索给定日期时,日期列上的索引是有意义的,例如:
select * from test
where the_date = '2016-01-01';
-- or
select * from test
where the_date between '2016-01-01' and '2016-01-31';
-- etc
In these queries there is no matter whether the sort order of primary key and the date column are the same or not. Hence rewriting the data to the new table will be useless. Just create an index.
在这些查询中,无论主键和日期列的排序顺序是否相同。因此将数据重写到新表将是无用的。只需创建一个索引。
However, if you are going to use the index only in ORDER BY
:
但是,如果您打算仅在ORDER BY
以下情况下使用索引:
select * from test
order by the_date;
then a primary key integer index may be significantly (2-4 times) faster then an index on a date column.
那么主键整数索引可能比日期列上的索引快得多(2-4 倍)。
回答by dmg
Postgres supports to some extend clustered indexes, which is what you suggest by removing and reinserting the data.
Postgres 支持一些扩展的聚集索引,这是您通过删除和重新插入数据所建议的。
In fact, removing and reinserting the data in the order you want will not change the time the query takes. Postgres does not know the order of the data.
事实上,按照您想要的顺序删除和重新插入数据不会改变查询所需的时间。Postgres 不知道数据的顺序。
If you know that the table's data does not change. Then cluster the data based on the index you create.
如果你知道表的数据没有改变。然后根据您创建的索引对数据进行聚类。
This operation reorders the table based on the order in the index. It is very effective until you update the table. The syntax is:
此操作根据索引中的顺序对表重新排序。在您更新表之前,它非常有效。语法是:
CLUSTER tableName USING IndexName;
See the manualfor details.
有关详细信息,请参阅手册。
I also recommend you use
我也建议你使用
explain <query>;
to compare two queries, before and after an index. Or before and after clustering.
比较索引前后的两个查询。或者在聚类之前和之后。