谁能解释一下 oracle“哈希组”是如何工作的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/154722/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 17:18:26  来源:igfitidea点击:

Can anyone explain how the oracle "hash group" works?

performanceoracle

提问by Nick Randell

I've recently come across a feature of doing a large query in oracle, where changing one thing resulted in a query that used to take 10 minutes taking 3 hours.

我最近遇到了在 oracle 中执行大型查询的功能,其中更改一件事导致查询过去需要 10 分钟需要 3 小时。

To briefly summarise, I store a lot of coordinates in the database, with each coordinate having a probability. I then want to 'bin' these coordinates into 50 metre bins (basically round the coordinate down to the nearest 50 metres) and sum the probability.

简单总结一下,我在数据库中存储了很多坐标,每个坐标都有一个概率。然后我想将这些坐标“合并”到 50 米的范围内(基本上将坐标向下舍入到最接近的 50 米)并对概率求和。

To do this, part of the query is 'select x,y,sum(probability) from .... group by x,y'

为此,查询的一部分是“select x,y,sum(probability) from .... group by x,y”

Initially I was storing a large number of points with a probability of 0.1 and queries were running reasonably ok, taking about 10 minutes for each one.

最初,我以 0.1 的概率存储了大量点,并且查询运行正常,每个点大约需要 10 分钟。

Then I had a request to change how the probabilities were calculated to adjust the distribution, so rather than all of them being 0.1, they were different values (e.g. 0.03, 0.06, 0.12, 0.3, 0.12, 0.06, 0.03). Running exactly the same query resulted in queries of about 3 hours.

然后我要求更改概率的计算方式以调整分布,因此它们不是全部为 0.1,而是不同的值(例如 0.03、0.06、0.12、0.3、0.12、0.06、0.03)。运行完全相同的查询会导致大约 3 小时的查询。

Changing back to all 0.1 brought the queries back to 10 minutes.

改回所有 0.1 使查询恢复到 10 分钟。

Looking at the query plan and performance of the system, it looked like the problem was with the 'hash group' functionality designed to speed up grouping in oracle. I'm guessing that it was creating hash entries for each unique x,y,probability value and then summing probability for each unique x,y value.

查看系统的查询计划和性能,问题似乎出在旨在加快 oracle 分组速度的“散列组”功能上。我猜它是为每个唯一的 x,y,probability 值创建哈希条目,然后为每个唯一的 x,y 值求和概率。

Can anyone explain this behaviour any better?

谁能更好地解释这种行为?

Additional Info

附加信息

Thanks to the answers. They allowed me to verify what was going on. I'm currently running a query and the tempseg_size from v$sql_workarea_active is currently at 7502561280 and growing rapidly.

感谢答案。他们让我核实发生了什么。我目前正在运行一个查询,来自 v$sql_workarea_active 的 tempseg_size 目前为 7502561280 并且增长迅速。

Given that the development server I'm running on only has 8gb of ram, it looks like the query needs to use temporary tables.

鉴于我运行的开发服务器只有 8gb 的内存,看起来查询需要使用临时表。

I've managed to workaround this for now by changing the types of queries and precalculating some of the information.

我现在已经通过更改查询类型和预先计算一些信息来解决这个问题。

回答by CaptainPicard

Hash group (and hash joins, as well as other operations such as sorts etc.) can use either optimal (i.e. in-memory), one-pass or multi-pass methods. The last two methods use TEMP storage and is thus much slower.

散列组(和散列连接,以及排序等其他操作)可以使用最优(即内存中)、单程或多程方法。后两种方法使用 TEMP 存储,因此速度要慢得多。

By increasing the number of possible items you might have exceeded the number of items that will fit in memory reserved for this type of operations.

通过增加可能的项目数量,您可能已经超过了为此类操作保留的内存中可容纳的项目数量。

Try looking at v$sql_workarea_active whilst the query is running, to see if this is the case. Or look at v$sql_workarea for historical information. It will also give you an indication of how much memory and/or temp space is needed for the operation.

尝试在查询运行时查看 v$sql_workarea_active,看看是否是这种情况。或者查看 v$sql_workarea 以获取历史信息。它还将为您指示操作需要多少内存和/或临时空间。

If turns out to be the actual problem - try increasing the pga_aggregate_target initialization parameter, if possible. The amount of memory available for optimal hash/sort operations is usually around a 5% fraction of the pga_aggregate_target.

如果结果是实际问题 - 如果可能,尝试增加 pga_aggregate_target 初始化参数。可用于最佳散列/排序操作的内存量通常约为 pga_aggregate_target 的 5%。

See the Performance Tuning Guidefor more detail.

有关更多详细信息,请参阅性能调优指南

回答by David Aldridge

"'m guessing that it was creating hash entries for each unique x,y,probability value and then summing probability for each unique x,y value" -- almost certainly so, since that is what the query requires.

“我猜它正在为每个唯一的 x,y,probability 值创建哈希条目,然后为每个唯一的 x,y 值求和概率”——几乎可以肯定,因为这是查询所需要的。

You can check for the likelihood of a query requiring temporary dfisk space to complete a sort or group-by (etc) by using the explain plan.

您可以使用解释计划检查需要临时 dfisk 空间来完成排序或分组(等)的查询的可能性。

explain plan for
select x,y,sum(probability) from .... group by x,y
/

select * from table(dbms_xplan.display)
/

If the optimizer can correctly deduce from statistics the approximate unique number of combinations of x and y then there's a pretty good chance that in the TempSpc column of the output of the second query it will show you just how much disk space (if any) will be required to complete the query (no column = no disk space requirement).

如果优化器可以从统计中正确推导出 x 和 y 组合的近似唯一数量,那么很有可能在第二个查询输出的 TempSpc 列中,它会显示多少磁盘空间(如果有)需要完成查询(无列 = 无磁盘空间要求)。

Way too much information here: http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_xplan.htm#i999234

这里的信息太多了:http: //download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_xplan.htm#i999234

If the temp space usage is high then as CaptP says, it may be time for some memory tweakage. On databases that perform a lot of sorts and aggregations it is common to specify a higher PGA target than an SGA target.

如果临时空间使用率很高,那么正如 CaptP 所说,可能是时候进行一些内存调整了。在执行大量排序和聚合的数据库上,通常指定比 SGA 目标更高的 PGA 目标。

回答by Andrew not the Saint

Is your PGA_AGGREGATE_TARGETset to zero by any chance? It's unlikely that it's the HASH GROUPBY on its own that caused the issue, it's probably something before it or after it. Downgrade your OPTIMIZER_FEATURES_ENABLEto 10.1.0.4 and rerun the query - you'll see that now you'll get a SORT GROUPBY which should pretty much always be outperformed by a HASH GROUPBY, unless your PGA sizing is set to MANUAL and your hash work area is undersized.

您的PGA_AGGREGATE_TARGET是否有机会设置为零?不太可能是 HASH GROUPBY 本身导致了问题,可能是在它之前或之后。将您的OPTIMIZER_FEATURES_ENABLE降级到 10.1.0.4 并重新运行查询 - 您将看到现在您将获得一个 SORT GROUPBY,它应该几乎总是优于 HASH GROUPBY,除非您的 PGA 大小设置为 MANUAL 并且您的哈希工作区尺寸过小。