PostgreSQL 在包含数组和大量更新的大表上运行缓慢

Question

提问by ibz

I have a pretty large table (20M records) which has a 3 column index and an array column. The array column is updated daily (by appending new values) for all rows. There is also inserts, but not as much as there are updates.

我有一个非常大的表（20M 记录），它有一个 3 列索引和一个数组列。数组列每天更新（通过附加新值）所有行。还有插入，但没有更新那么多。

The data in the array represents daily measurements corresponding to the three keys, something like this: [[date_id_1, my_value_for_date_1], [date_id_2, my_value_for_date_2]]. It is used to draw a graph of those daily values. Say I want to visualize the value for the key (a, b, c) over time, I do SELECT values FROM t WHERE a = my_a AND b = my_b AND c = my_c. Then I use the valuesarray to draw the graph.

所述阵列中的数据表示对应于三个键，像这样每日测量：[[date_id_1, my_value_for_date_1], [date_id_2, my_value_for_date_2]]。它用于绘制这些每日值的图表。假设我想随着时间的推移可视化键 (a, b, c) 的值，我这样做了SELECT values FROM t WHERE a = my_a AND b = my_b AND c = my_c。然后我使用values数组来绘制图形。

Performance of the updates (which happen in a bulk once a day) has worsened considerably over time.

随着时间的推移，更新的性能（每天批量发生一次）已经大大恶化。

Using PostgreSQL 8.3.8.

使用 PostgreSQL 8.3.8。

Can you give me any hints of where to look for a solution? It could be anything from tweaking some parameters in postgres to even moving to another database (I guess a non-relational database would be better suited for this particular table, but I don't have much experience with those).

你能给我任何关于在哪里寻找解决方案的提示吗？从调整 postgres 中的一些参数到甚至移动到另一个数据库（我想非关系数据库更适合这个特定的表，但我对这些没有太多经验），它可能是任何事情。

Answer 1

回答by Frank Heikens

I would take a look at the FILLFACTOR for the table. By default it's set to 100, you could lower it to 70 (to start with). After this, you have to do a VACUUM FULL to rebuild the table.

我会看看表格的 FILLFACTOR。默认情况下它设置为 100，您可以将其降低到 70（开始时）。在此之后，您必须执行 VACUUM FULL 以重建表。

ALTER TABLE tablename SET (FILLFACTOR = 70);
VACUUM FULL tablename;
REINDEX TABLE tablename;

This gives UPDATE a chance to place the updated copy of a row on the same page as the original, which is more efficient than placing it on a different page. Or if your database is already somewhat fragmented from lots of previous updated, it might already be sparese enough. Now your database also has the option to do HOT updates, assuming the column you are updating is not one involved in any index.

这使 UPDATE 有机会将行的更新副本放置在与原始页面相同的页面上，这比将其放置在不同的页面上更有效。或者，如果您的数据库已经从许多以前的更新中变得有些碎片化，那么它可能已经足够空闲了。现在您的数据库还可以选择进行HOT 更新，假设您正在更新的列不涉及任何索引。

Answer 2

回答by James Anderson

Not sure if arrays are the way to go here.

不确定数组是否适合这里。

Why not store these in a separate table (one value plus keys per row) then you bulk update will be pure insert activity.

为什么不将这些存储在单独的表中（每行一个值加键），那么批量更新将是纯插入活动。

Answer 3

回答by James Anderson

The problem is in updates. Change the schema from array based to multiple-rows per day, and the performance problem will go away.

问题出在更新上。将架构从基于数组更改为每天多行，性能问题就会消失。

You can add rollups to arrays, later on, with some kind of cronjob, but avoid updates.

您可以稍后使用某种 cronjob 将汇总添加到数组，但要避免更新。

Answer 4

回答by pyrocumulus

Well a 3-column index is nothing to worry about. That doesn't necessarily make it that much slower. But that array-column might indeed be the problem. You say you are appending values to that array-column daily. By appending, do you mean appending values to all the 20 mln. records in the table? Or just some records?

嗯，3 列索引没什么好担心的。这并不一定会使它变慢。但是那个数组列可能确实是问题所在。您说您每天都将值附加到该数组列。通过附加，您的意思是将值附加到所有 2000 万。表中的记录？还是只是一些记录？

The situation isn't completely clear to me, but I would suggest looking into ways of getting rid of that array-column. Making it a separate table for example. But, this depends on your situation and might not be an option. It might be just me, but I always feel 'dirty' having such a column in one of my tables. And most of the time there is a better solution for the problem you are trying to solve with that array-column. That being said, there are certainly situations in which such a column is valid, but at the moment, I can think of none. Certainly not in a table with a 20 mln. record count.

这种情况对我来说并不完全清楚，但我建议寻找摆脱该数组列的方法。例如，使它成为一个单独的表。但是，这取决于您的情况，可能不是一种选择。可能只是我一个人，但我总是觉得在我的一张桌子上有这样一列“脏”。大多数情况下，对于您尝试使用该数组列解决的问题，有更好的解决方案。话虽如此，肯定存在这样一个列有效的情况，但目前，我想不出任何情况。当然不是在一张有 2000 万的桌子上。记录数。

PostgreSQL 在包含数组和大量更新的大表上运行缓慢

提问by ibz

回答by Frank Heikens

回答by James Anderson

回答by James Anderson

回答by pyrocumulus

相关推荐

最近更新

标签

PostgreSQL 在包含数组和大量更新的大表上运行缓慢

提问by ibz

回答by Frank Heikens

回答by James Anderson

回答by James Anderson

回答by pyrocumulus

相关推荐

在 PostgreSQL 的时间戳列上使用 Hibernate 进行日期查询

postgresql 如何在不创建函数的情况下执行 pl/pgsql 代码？

postgresql Rails 的 ActiveRecord::Migration 的外键？

postgresql 休眠模式参数在@SequenceGenerator 注释中不起作用

相关推荐

最近更新

标签