PostgreSQL 在包含数组和大量更新的大表上运行缓慢
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3100072/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL slow on a large table with arrays and lots of updates
提问by ibz
I have a pretty large table (20M records) which has a 3 column index and an array column. The array column is updated daily (by appending new values) for all rows. There is also inserts, but not as much as there are updates.
我有一个非常大的表(20M 记录),它有一个 3 列索引和一个数组列。数组列每天更新(通过附加新值)所有行。还有插入,但没有更新那么多。
The data in the array represents daily measurements corresponding to the three keys, something like this: [[date_id_1, my_value_for_date_1], [date_id_2, my_value_for_date_2]]
. It is used to draw a graph of those daily values. Say I want to visualize the value for the key (a, b, c) over time, I do SELECT values FROM t WHERE a = my_a AND b = my_b AND c = my_c
. Then I use the values
array to draw the graph.
所述阵列中的数据表示对应于三个键,像这样每日测量:[[date_id_1, my_value_for_date_1], [date_id_2, my_value_for_date_2]]
。它用于绘制这些每日值的图表。假设我想随着时间的推移可视化键 (a, b, c) 的值,我这样做了SELECT values FROM t WHERE a = my_a AND b = my_b AND c = my_c
。然后我使用values
数组来绘制图形。
Performance of the updates (which happen in a bulk once a day) has worsened considerably over time.
随着时间的推移,更新的性能(每天批量发生一次)已经大大恶化。
Using PostgreSQL 8.3.8.
使用 PostgreSQL 8.3.8。
Can you give me any hints of where to look for a solution? It could be anything from tweaking some parameters in postgres to even moving to another database (I guess a non-relational database would be better suited for this particular table, but I don't have much experience with those).
你能给我任何关于在哪里寻找解决方案的提示吗?从调整 postgres 中的一些参数到甚至移动到另一个数据库(我想非关系数据库更适合这个特定的表,但我对这些没有太多经验),它可能是任何事情。
回答by Frank Heikens
I would take a look at the FILLFACTOR for the table. By default it's set to 100, you could lower it to 70 (to start with). After this, you have to do a VACUUM FULL to rebuild the table.
我会看看表格的 FILLFACTOR。默认情况下它设置为 100,您可以将其降低到 70(开始时)。在此之后,您必须执行 VACUUM FULL 以重建表。
ALTER TABLE tablename SET (FILLFACTOR = 70);
VACUUM FULL tablename;
REINDEX TABLE tablename;
This gives UPDATE a chance to place the updated copy of a row on the same page as the original, which is more efficient than placing it on a different page. Or if your database is already somewhat fragmented from lots of previous updated, it might already be sparese enough. Now your database also has the option to do HOT updates, assuming the column you are updating is not one involved in any index.
这使 UPDATE 有机会将行的更新副本放置在与原始页面相同的页面上,这比将其放置在不同的页面上更有效。或者,如果您的数据库已经从许多以前的更新中变得有些碎片化,那么它可能已经足够空闲了。现在您的数据库还可以选择进行HOT 更新,假设您正在更新的列不涉及任何索引。
回答by James Anderson
Not sure if arrays are the way to go here.
不确定数组是否适合这里。
Why not store these in a separate table (one value plus keys per row) then you bulk update will be pure insert activity.
为什么不将这些存储在单独的表中(每行一个值加键),那么批量更新将是纯插入活动。
回答by James Anderson
The problem is in updates. Change the schema from array based to multiple-rows per day, and the performance problem will go away.
问题出在更新上。将架构从基于数组更改为每天多行,性能问题就会消失。
You can add rollups to arrays, later on, with some kind of cronjob, but avoid updates.
您可以稍后使用某种 cronjob 将汇总添加到数组,但要避免更新。
回答by pyrocumulus
Well a 3-column index is nothing to worry about. That doesn't necessarily make it that much slower. But that array-column might indeed be the problem. You say you are appending values to that array-column daily. By appending, do you mean appending values to all the 20 mln. records in the table? Or just some records?
嗯,3 列索引没什么好担心的。这并不一定会使它变慢。但是那个数组列可能确实是问题所在。您说您每天都将值附加到该数组列。通过附加,您的意思是将值附加到所有 2000 万。表中的记录?还是只是一些记录?
The situation isn't completely clear to me, but I would suggest looking into ways of getting rid of that array-column. Making it a separate table for example. But, this depends on your situation and might not be an option. It might be just me, but I always feel 'dirty' having such a column in one of my tables. And most of the time there is a better solution for the problem you are trying to solve with that array-column. That being said, there are certainly situations in which such a column is valid, but at the moment, I can think of none. Certainly not in a table with a 20 mln. record count.
这种情况对我来说并不完全清楚,但我建议寻找摆脱该数组列的方法。例如,使它成为一个单独的表。但是,这取决于您的情况,可能不是一种选择。可能只是我一个人,但我总是觉得在我的一张桌子上有这样一列“脏”。大多数情况下,对于您尝试使用该数组列解决的问题,有更好的解决方案。话虽如此,肯定存在这样一个列有效的情况,但目前,我想不出任何情况。当然不是在一张有 2000 万的桌子上。记录数。