SQL 使用 Postgresql 进行高效的最新记录查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1684244/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:17:27  来源:igfitidea点击:

Efficient latest record query with Postgresql

sqlpostgresqlsubquerydistinct

提问by Sheldon Ross

I need to do a big query, but I only want the latest records.

我需要做一个大查询,但我只想要最新的记录。

For a single entry I would probably do something like

对于单个条目,我可能会做类似的事情

SELECT * FROM table WHERE id = ? ORDER BY date DESC LIMIT 1;

But I need to pull the latest records for a large (thousands of entries) number of records, but only the latest entry.

但是我需要为大量(数千个条目)记录提取最新记录,但只需要提取最新条目。

Here's what I have. It's not very efficient. I was wondering if there's a better way.

这就是我所拥有的。这不是很有效。我想知道是否有更好的方法。

SELECT * FROM table a WHERE ID IN $LIST AND date = (SELECT max(date) FROM table b WHERE b.id = a.id);

采纳答案by intgr

If you don't want to change your data model, you can use DISTINCT ONto fetch the newest record from table "b" for each entry in "a":

如果您不想更改数据模型,您可以使用DISTINCT ON从表“b”中为“a”中的每个条目获取最新记录:

SELECT DISTINCT ON (a.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY a.id, b.date DESC

If you want to avoid a "sort" in the query, adding an index like this mighthelp you, but I am not sure:

如果您想避免查询中的“排序”,添加这样的索引可能会对您有所帮助,但我不确定:

CREATE INDEX b_id_date ON b (id, date DESC)

SELECT DISTINCT ON (b.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY b.id, b.date DESC

Alternatively, if you want to sort records from table "a" some way:

或者,如果您想以某种方式对表“a”中的记录进行排序:

SELECT DISTINCT ON (sort_column, a.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY sort_column, a.id, b.date DESC

Alternative approaches

替代方法

However, all of the above queries still need to read all referenced rowsfrom table "b", so if you have lots of data, it might still just be too slow.

但是,上述所有查询仍然需要从表“b”中读取所有引用的行,因此如果您有大量数据,它可能仍然太慢。

You could create a new table, which only holds the newest "b" record for each a.id-- or even move those columns into the "a" table itself.

您可以创建一个新表,它只保存每个最新的“b”记录a.id——或者甚至将这些列移动到“a”表本身中。

回答by manji

this could be more eficient. Difference: query for table b is executed only 1 time, your correlated subquery is executed for every row:

这可能更有效。区别:对表 b 的查询仅执行 1 次,您的相关子查询对每一行执行:

SELECT * 
FROM table a 
JOIN (SELECT ID, max(date) maxDate
        FROM table
      GROUP BY ID) b
ON a.ID = b.ID AND a.date = b.maxDate
WHERE ID IN $LIST 

回答by unknown

what do you think about this?

你怎么看待这件事?

select * from (
   SELECT a.*, row_number() over (partition by a.id order by date desc) r 
   FROM table a where ID IN $LIST 
)
WHERE r=1

i used it a lot on the past

我过去经常用它

回答by youngthing

On method - create a small derivative table containing the most recent update / insertion times on table a - call this table a_latest. Table a_latest will need sufficient granularity to meet your specific query requirements. In your case it should be sufficient to use

在方法上 - 创建一个包含表 a 上最近更新/插入时间的小衍生表 - 将此表称为 a_latest。表 a_latest 需要足够的粒度来满足您的特定查询要求。在您的情况下,使用就足够了

CREATE TABLE 
a_latest 
( id INTEGER NOT NULL, 
  date TSTAMP NOT NULL, 
  PRIMARY KEY (id, max_time) );

Then use a query similar to that suggested by najmeddine :

然后使用类似于 najmeddine 建议的查询:

SELECT a.* 
FROM TABLE a, TABLE a_latest 
USING ( id, date );

The trick then is keeping a_latest up to date. Do this using a trigger on insertions and updates. A trigger written in plppgsql is fairly easy to write. I am happy to provide an example if you wish.

诀窍是使 a_latest 保持最新。使用插入和更新的触发器来执行此操作。用 plppgsql 编写的触发器相当容易编写。如果你愿意,我很乐意提供一个例子。

The point here is that computation of the latest update time is taken care of during the updates themselves. This shifts more of the load away from the query.

这里的重点是在更新本身期间计算最新更新时间。这将更多的负载从查询转移。

回答by peufeu

If you have many rows per id's you definitely want a correlated subquery. It will make 1 index lookup per id, but this is faster than sorting the whole table.

如果每个 id 有很多行,你肯定需要一个相关的子查询。它将为每个 id 进行 1 次索引查找,但这比对整个表进行排序要快。

Something like :

就像是 :

SELECT a.id,
(SELECT max(t.date) FROM table t WHERE t.id = a.id) AS lastdate
FROM table2;

The 'table2' you will use is not the table you mention in your query above, because here you need a list of distinct id's for good performance. Since your ids are probably FKs into another table, use this one.

您将使用的“table2”不是您在上面的查询中提到的表,因为在这里您需要一个不同 id 的列表以获得良好的性能。由于您的 ID 可能是另一个表的 FK,因此请使用此表。