Oracle 计数 (*) 花费太多时间

Question

提问by Brabin

I was trying to fetch the count(*)from the table, which has almost 7 million records and it taking more than an hour for returning the result.

我试图count(*)从表中获取，该表有近 700 万条记录，返回结果需要一个多小时。

Also the table has 153 columns out of which index has been created for column 123, so tried to run the following query in parallel, but it didn't help.

此外，该表有 153 列，其中为第 123 列创建了索引，因此尝试并行运行以下查询，但没有帮助。

select /*+ parallel (5) */ count(123) from <table_name>

Please suggest if there is alternative way.

请建议是否有替代方法。

When I ran descon the table in Toad, the index tab holds the value of no. of rows. Any idea how that value is getting updated there?

当我desc在 Toad 中运行表时，索引选项卡的值为 no。行。知道该值如何在那里更新吗？

Answer 1

采纳答案by ntalbs

Counting the number of rows of large table takes long time. It's natural. Some DBMS stores the number of records, however, this kinds of DBMS limits concurrency. It should lock the entire table before DML operation on the table. (The entire table lock is necessary to update the count properly.)

计算大表的行数需要很长时间。这是自然的。一些 DBMS 存储记录数，但是，这种 DBMS 限制了并发性。它应该在对表进行 DML 操作之前锁定整个表。（整个表锁是正确更新计数所必需的。）

The value in ALL_TABLES.NUM_ROWS(or USER_TABLES.NUM_ROWS) is just a statistical information generated by analyze table ...or dbms_stats.gather_table_statsprocedure. It's not accurate, not real-time information.

在值ALL_TABLES.NUM_ROWS（或USER_TABLES.NUM_ROWS）只是由所产生的统计信息analyze table ...或dbms_stats.gather_table_stats程序。它不准确，不是实时信息。

If you don't need the exact number of rows, you can use the statistical information. However you shouldn't depend on it. It's used by Oracle optimizer, but shouldn't in application program.

如果不需要确切的行数，则可以使用统计信息。但是，您不应该依赖它。它被 Oracle 优化器使用，但不应在应用程序中使用。

I'm not sure why you have to count the number of rows of the table. ~~If you need it in the batch program which is run infrequently, you can partition the table to increase the parallelism.~~If you need the count in online program, you should find a way not to use the count.

我不知道你为什么要计算表格的行数。~~如果在运行不频繁的批处理程序中需要它，可以对表进行分区以增加并行度。~~如果您需要在线程序中的计数，您应该找到一种不使用计数的方法。

Answer 2

回答by David Aldridge

A few issues to mention:

需要说明的几个问题：

For "select count(*) from table" to use an index, the indexed column must be non-nullable, or the index must be a bitmap type.
If there are known to be no nulls in the column but there is no not null constraint on it, then use "select count(*) from table where column_name is not null".
It does of course have to be more efficient to scan the index than the table, but with so many table columns you're probably fine there.
If you really want a parallel index scan, use the parallel_index hint, not parallel. But with only 7 million rows you might not find any need for parallelism.
You need to check the execution plan to see if an index and/or parallel query is in use.
If you can use an estimated number of rows then consider using the sample clause: for example "select 1000*count(*) from table sample(0.1)"

要使“select count(*) from table”使用索引，索引列必须不可为空，或者索引必须是位图类型。
如果已知列中没有空值但没有非空约束，则使用“select count(*) from table where column_name is not null”。
扫描索引当然必须比扫描表更有效，但是有这么多表列，你可能没问题。
如果您真的想要并行索引扫描，请使用 parallel_index 提示，而不是并行。但是只有 700 万行，您可能不需要并行性。
您需要检查执行计划以查看是否正在使用索引和/或并行查询。
如果您可以使用估计的行数，则考虑使用示例子句：例如“select 1000*count(*) from table sample(0.1)”

Answer 3

回答by APC

select /*+ parallel (5) */

Seems like odd number for degree of parallelism. Well, obvious 5 is an odd number, and that is strange. The DoPs ought to be a ~~power~~multiple of two (see below for more).

并行度似乎是奇数。嗯，很明显 5 是一个奇数，这很奇怪。DoP 应该是 2 的幂倍数（更多信息见下文）。

Anyway, do you have a reason for using parallel query? Do you have at least five spare processors? If not, there is a good chance the overhead of managing the PQ slaves is at least contributing to the poor performance.

无论如何，您有使用并行查询的理由吗？你有至少五个备用处理器吗？如果不是，则管理 PQ 从站的开销很可能至少会导致性能不佳。

Why should DOP = n*2? There is an established heuristic based on Queuing Theory that running more than two batch jobs simultaneously leads to degraded performance. Find out more.(I think queuing theory actually recommends a figure of 1.8, but as database jobs are often bound by I/O or disk we can usually get away with 2.)

为什么要 DOP = n*2？有一种基于排队理论的既定启发式方法，即同时运行两个以上的批处理作业会导致性能下降。了解更多。（我认为排队理论实际上推荐的数字为 1.8，但由于数据库作业通常受 I/O 或磁盘的约束，我们通常可以使用 2。）

I originally said "power of 2" but that's mainly because multi-core servers tend to have a number of CPUs which is a power of 2, but multiple of 2 is more accurate, because some boxes have 12 CPUs or some other number.

我最初说的是“2 的幂”，但这主要是因为多核服务器往往有多个 CPU，即 2 的幂，但 2 的倍数更准确，因为有些机器有 12 个 CPU 或其他一些数字。

Now, if we have a 64 core box, a DOP of 5 or 37 is fine, because we have enough CPUs to run that many threads simultaneously. But if we have got a small quadcore box, only 2, 4 or 8 makes sense, because those are the only values which will ensure an even distribution of work across all four processors. Running five threads on a quadcore box means one CPU will be doing a lot more work than the other three; there is a possibility that it will take longer to finish, leaving the other three slaves waiting. So DOP=5can actually lead to a greater elapsed time than DOP=4.

现在，如果我们有一个 64 核的机器，5 或 37 的 DOP 就可以了，因为我们有足够的 CPU 来同时运行这么多线程。但是如果我们有一个小的四核盒子，那么只有 2、4 或 8 个才有意义，因为只有这些值才能确保在所有四个处理器上均匀分配工作。在四核机器上运行五个线程意味着一个 CPU 将比其他三个执行更多的工作；有可能需要更长的时间才能完成，让其他三个奴隶等待。所以DOP=5实际上可以导致比更长的经过时间DOP=4。

DOP=n*2is only a rule of thumb, and not set in stone. However, it is based on sound reasoning, and we should know why we're doing something different. Obviously, we should have conducted some experiments to confirm that we have chosen the right DOP (whatever value we settle on).

DOP=n*2只是一个经验法则，而不是一成不变的。然而，它是基于合理的推理，我们应该知道为什么我们要做一些不同的事情。显然，我们应该进行一些实验来确认我们选择了正确的 DOP（无论我们确定什么值）。

Oracle 计数 (*) 花费太多时间

提问by Brabin

采纳答案by ntalbs

回答by David Aldridge

回答by APC

相关推荐

最近更新

标签

Oracle 计数 (*) 花费太多时间

提问by Brabin

采纳答案by ntalbs

回答by David Aldridge

回答by APC

相关推荐

oracle 不允许零长度列 - 从视图创建表

oracle 如何使用java读取从存储过程返回的类型数组？

group by (ORACLE 9i) 中的子查询

Oracle 11g + Hibernate -> ORA-01461：只能为插入 LONG 列绑定 LONG 值

相关推荐

最近更新

标签