SQL 在非常大的表中计算确切行数的最快方法？

Question

提问by Swaranga Sarma

I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAMEwill be slow when the table has lots of rows and lots of columns.

我遇到过这样的文章，SELECT COUNT(*) FROM TABLE_NAME当表格有很多行和很多列时，速度会很慢。

I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACTcount of the number of rows of a table?

我有一个可能包含数十亿行的表 [它大约有 15 列]。有没有更好的方法来获得表行数的精确计数？

Please consider the following before your answer:

在回答之前，请考虑以下几点：

I am looking for a database vendor independent solution. It is OK if it covers MySQL, Oracle, MS SQL Server. But if there is reallyno database vendor independent solution then I will settle for different solutions for different database vendors.
I cannot use any other external tool to do this. I am mainly looking for a SQL based solution.
I cannot normalize my database design any further. It is already in 3NF and moreover a lot of code has already been written around it.

我正在寻找独立于数据库供应商的解决方案。如果它涵盖MySQL、Oracle、MS SQL Server 就可以了。但如果真的没有独立于数据库供应商的解决方案，那么我将为不同的数据库供应商提供不同的解决方案。
我不能使用任何其他外部工具来做到这一点。我主要是在寻找基于 SQL 的解决方案。
我无法进一步规范我的数据库设计。它已经在 3NF 中，而且已经围绕它编写了很多代码。

Answer 1

回答by gbn

Simple answer:

简单回答：

Database vendor independent solution = use the standard = COUNT(*)
There are approximateSQL Server solutions but don't use COUNT(*) = out of scope

独立于数据库供应商的解决方案 = 使用标准 = COUNT(*)
有近似的SQL Server 解决方案，但不要使用 COUNT(*) = out of scope

Notes:

笔记：

COUNT(1) = COUNT(*) = COUNT(PrimaryKey)just in case

COUNT(1) = COUNT(*) = COUNT(PrimaryKey)以防万一

Edit:

编辑：

SQL Server example (1.4 billion rows, 12 columns)

SQL Server 示例（14 亿行，12 列）

SELECT COUNT(*) FROM MyBigtable WITH (NOLOCK)
-- NOLOCK here is for me only to let me test for this answer: no more, no less

1 runs, 5:46 minutes, count = 1,401,659,700

1 次运行，5:46 分钟，计数 = 1,401,659,700

--Note, sp_spaceused uses this DMV
SELECT
   Total_Rows= SUM(st.row_count)
FROM
   sys.dm_db_partition_stats st
WHERE
    object_name(object_id) = 'MyBigtable' AND (index_id < 2)

2 runs, both under 1 second, count = 1,401,659,670

2 次运行，均在 1 秒内，计数 = 1,401,659,670

The second one has less rows = wrong. Would be the same or more depending on writes (deletes are done out of hours here)

第二个行数较少 = 错误。将相同或更多取决于写入（删除是在几个小时内完成的）

Answer 2

回答by salbahra

The fastest way by far on MySQL is:

到目前为止，MySQL 上最快的方法是：

SHOW TABLE STATUS;

You will instantly get all your tables with the row count (which is the total) along with plenty of extra information if you want.

如果需要，您将立即获得所有带有行数（即总数）的表格以及大量额外信息。

Answer 3

回答by Denis de Bernardy

I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME will be slow when the table has lots of rows and lots of columns.

我遇到过一些文章，指出当表有很多行和很多列时，SELECT COUNT(*) FROM TABLE_NAME 会很慢。

That depends on the database. Some speed up counts, for instance by keeping track of whether rows are live or dead in the index, allowing for an index only scan to extract the number of rows. Others do not, and consequently require visiting the whole table and counting live rows one by one. Either will be slow for a huge table.

这取决于数据库。一些加速计数，例如通过跟踪索引中的行是活的还是死的，允许仅索引扫描以提取行数。其他人则不需要，因此需要访问整个表并一一计算活动行。对于一张大桌子来说，要么会很慢。

Note that you can generally extract a good estimate by using query optimization tools, table statistics, etc. In the case of PostgreSQL, for instance, you could parse the output of explain count(*) from yourtableand get a reasonably good estimate of the number of rows. Which brings me to your second question.

请注意，您通常可以通过使用查询优化工具、表统计信息等来提取一个好的估计值。例如，在 PostgreSQL 的情况下，您可以解析的输出explain count(*) from yourtable并获得合理的行数估计值。这让我想到你的第二个问题。

I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACT count of the number of rows of a table?

我有一个可能包含数十亿行的表 [它大约有 15 列]。有没有更好的方法来获得表行数的精确计数？

Seriously? :-) You really mean the exactcount from a table with billions of rows? Are you really sure? :-)

严重地？:-) 你真的是指一个有数十亿行的表的准确计数吗？你真的确定吗？:-)

If you reallydo, you could keep a trace of the total using triggers, but mind concurrency and deadlocks if you do.

如果你真的这样做了，你可以使用触发器跟踪总数，但如果你这样做，请注意并发和死锁。

Answer 4

回答by Jesse Webb

Is there a better way to get the EXACT count of the number of rows of a table?

有没有更好的方法来获得表行数的精确计数？

To answer your question simply, No.

简单回答您的问题，否。

If you need a DBMS independent way of doing this, the fastestway will always be:

如果您需要一种独立于 DBMS 的方法来执行此操作，最快的方法始终是：

SELECT COUNT(*) FROM TableName

Some DBMS vendors may have quicker ways which will work for their systems only. Some of these options are already posted in other answers.

一些 DBMS 供应商可能有更快的方法，这些方法只适用于他们的系统。其中一些选项已经发布在其他答案中。

COUNT(*)should be optimized by the DBMS (at least any PROD worthy DB) anyway, so don't try to bypass their optimizations.

COUNT(*)无论如何都应该由 DBMS 优化（至少是任何值得 PROD 的数据库），所以不要试图绕过他们的优化。

On a side note:
I am sure many of your other queries also take a long time to finish because of your table size. Any performance concerns should probably be addressed by thinking about your schema design with speed in mind. I realize you said that it is not an option to change but it might turn out that 10+ minute queries aren't an option either. 3rd NF is not always the best approach when you need speed, and sometimes data can be partitioned in several tables if the records don't haveto be stored together. Something to think about...

附带说明：
我相信您的许多其他查询也需要很长时间才能完成，因为您的表大小。任何性能问题都应该通过在考虑速度的情况下考虑架构设计来解决。我意识到你说过这不是改变的选项，但结果可能是 10 分钟以上的查询也不是一个选项。3 NF并不总是最好的方法，当你需要的速度，有时数据可以在多个表中，如果记录不就被划分具有存储在一起。有什么要考虑的...

Answer 5

回答by jams

You can try this sp_spaceused (Transact-SQL)

你可以试试这个sp_spaceused (Transact-SQL)

Displays the number of rows, disk space reserved, and disk space used by a table, indexed view, or Service Broker queue in the current database, or displays the disk space reserved and used by the whole database.

显示当前数据库中某个表、索引视图或Service Broker 队列的行数、保留的磁盘空间和使用的磁盘空间，或显示整个数据库保留和使用的磁盘空间。

Answer 6

回答by JakeJ

I got this script from another StackOverflow question/answer:

我从另一个 StackOverflow 问题/答案中得到了这个脚本：

SELECT SUM(p.rows) FROM sys.partitions AS p
  INNER JOIN sys.tables AS t
  ON p.[object_id] = t.[object_id]
  INNER JOIN sys.schemas AS s
  ON s.[schema_id] = t.[schema_id]
  WHERE t.name = N'YourTableNameHere'
  AND s.name = N'dbo'
  AND p.index_id IN (0,1);

My table has 500 million records and the above returns in less than 1ms. Meanwhile,

我的表有 5 亿条记录，上面的返回时间不到 1 毫秒。同时，

SELECT COUNT(id) FROM MyTable

takes a full 39 minutes, 52 seconds!

耗时整整 39 分 52 秒！

They yield the exact same number of rows (in my case, exactly 519326012).

它们产生完全相同的行数（在我的例子中，正好是 519326012）。

I do not know if that would always be the case.

我不知道是否会一直如此。

Answer 7

回答by Alireza Maddah

If SQL Server edition is 2005/2008, you can use DMVs to calculate the row count in a table:

如果 SQL Server 版本是 2005/2008，则可以使用 DMV 来计算表中的行数：

-- Shows all user tables and row counts for the current database 
-- Remove is_ms_shipped = 0 check to include system objects 
-- i.index_id < 2 indicates clustered index (1) or hash table (0) 
SELECT o.name, 
 ddps.row_count 
FROM sys.indexes AS i 
 INNER JOIN sys.objects AS o ON i.OBJECT_ID = o.OBJECT_ID 
 INNER JOIN sys.dm_db_partition_stats AS ddps ON i.OBJECT_ID = ddps.OBJECT_ID 
 AND i.index_id = ddps.index_id 
WHERE i.index_id < 2 
 AND o.is_ms_shipped = 0 
ORDER BY o.NAME

For SQL Server 2000 database engine, sysindexes will work, but it is strongly advised to avoid using it in future editions of SQL Server as it may be removed in the near future.

对于 SQL Server 2000 数据库引擎，sysindexes 将起作用，但强烈建议避免在 SQL Server 的未来版本中使用它，因为它可能在不久的将来被删除。

Sample code taken from: How To Get Table Row Counts Quickly And Painlessly

示例代码取自：How To Get Table Row Counts Quickly and Painless

Answer 8

回答by Mainsh S

I use

我用

select /*+ parallel(a) */  count(1) from table_name a;

Answer 9

回答by john rains

I'm nowhere near as expert as others who have answered but I was having an issue with a procedure I was using to select a random row from a table (not overly relevant) but I needed to know the number of rows in my reference table to calculate the random index. Using the traditional Count(*) or Count(1) work but I was occasionally getting up to 2 seconds for my query to run. So instead (for my table named 'tbl_HighOrder') I am using:

我远不及其他回答的专家那么专业，但我在使用从表中选择随机行的程序时遇到问题（不是过于相关），但我需要知道参考表中的行数计算随机指数。使用传统的 Count(*) 或 Count(1) 可以工作，但我的查询有时需要长达 2 秒才能运行。所以相反（对于我的名为 'tbl_HighOrder' 的表）我使用：

Declare @max int

Select @max = Row_Count
From sys.dm_db_partition_stats
Where Object_Name(Object_Id) = 'tbl_HighOrder'

It works great and query times in Management Studio are zero.

它运行良好，Management Studio 中的查询时间为零。

Answer 10

回答by Kaliyug Antagonist

Well, late by 5 years and unsure if it helps :

好吧，晚了 5 年，不确定它是否有帮助：

I was trying to count the no. of rows in a SQL Server table using MS SQL Server Management Studioand ran into some overflow error, then I used the below :

我试图数数没有。使用MS SQL Server Management Studio的 SQL Server 表中的行数并遇到一些溢出错误，然后我使用了以下内容：

select count_big(1) FROM [dbname].[dbo].[FactSampleValue];

The result :

结果：

24296650578 rows

24296650578 行

SQL 在非常大的表中计算确切行数的最快方法？

提问by Swaranga Sarma

回答by gbn

回答by salbahra

回答by Denis de Bernardy

回答by Jesse Webb

回答by jams

回答by JakeJ

回答by Alireza Maddah

回答by Mainsh S

回答by john rains

回答by Kaliyug Antagonist

相关推荐

最近更新

标签

SQL 在非常大的表中计算确切行数的最快方法？

提问by Swaranga Sarma

回答by gbn

回答by salbahra

回答by Denis de Bernardy

回答by Jesse Webb

回答by jams

回答by JakeJ

回答by Alireza Maddah

回答by Mainsh S

回答by john rains

回答by Kaliyug Antagonist

相关推荐

无法从 SSMS 连接到 SQL Server express

SQL sql中View和table的区别

Sql 查询，选择唯一标识符给出 - 将数据类型 varchar 转换为 uniqueidentifier 时出错

SQL 有没有办法在 Oracle 10g 中的数据库之间复制 BLOB 记录？

相关推荐

最近更新

标签