SQL 在非常大的表中计算确切行数的最快方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6069237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to count exact number of rows in a very large table?
提问by Swaranga Sarma
I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME
will be slow when the table has lots of rows and lots of columns.
我遇到过这样的文章,SELECT COUNT(*) FROM TABLE_NAME
当表格有很多行和很多列时,速度会很慢。
I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACTcount of the number of rows of a table?
我有一个可能包含数十亿行的表 [它大约有 15 列]。有没有更好的方法来获得表行数的精确计数?
Please consider the following before your answer:
在回答之前,请考虑以下几点:
I am looking for a database vendor independent solution. It is OK if it covers MySQL, Oracle, MS SQL Server. But if there is reallyno database vendor independent solution then I will settle for different solutions for different database vendors.
I cannot use any other external tool to do this. I am mainly looking for a SQL based solution.
I cannot normalize my database design any further. It is already in 3NF and moreover a lot of code has already been written around it.
我正在寻找独立于数据库供应商的解决方案。如果它涵盖MySQL、Oracle、MS SQL Server 就可以了。但如果真的没有独立于数据库供应商的解决方案,那么我将为不同的数据库供应商提供不同的解决方案。
我不能使用任何其他外部工具来做到这一点。我主要是在寻找基于 SQL 的解决方案。
我无法进一步规范我的数据库设计。它已经在 3NF 中,而且已经围绕它编写了很多代码。
回答by gbn
Simple answer:
简单回答:
- Database vendor independent solution = use the standard =
COUNT(*)
- There are approximateSQL Server solutions but don't use COUNT(*) = out of scope
- 独立于数据库供应商的解决方案 = 使用标准 =
COUNT(*)
- 有近似的SQL Server 解决方案,但不要使用 COUNT(*) = out of scope
Notes:
笔记:
COUNT(1) = COUNT(*) = COUNT(PrimaryKey)just in case
COUNT(1) = COUNT(*) = COUNT(PrimaryKey)以防万一
Edit:
编辑:
SQL Server example (1.4 billion rows, 12 columns)
SQL Server 示例(14 亿行,12 列)
SELECT COUNT(*) FROM MyBigtable WITH (NOLOCK)
-- NOLOCK here is for me only to let me test for this answer: no more, no less
1 runs, 5:46 minutes, count = 1,401,659,700
1 次运行,5:46 分钟,计数 = 1,401,659,700
--Note, sp_spaceused uses this DMV
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'MyBigtable' AND (index_id < 2)
2 runs, both under 1 second, count = 1,401,659,670
2 次运行,均在 1 秒内,计数 = 1,401,659,670
The second one has less rows = wrong. Would be the same or more depending on writes (deletes are done out of hours here)
第二个行数较少 = 错误。将相同或更多取决于写入(删除是在几个小时内完成的)
回答by salbahra
The fastest way by far on MySQL is:
到目前为止,MySQL 上最快的方法是:
SHOW TABLE STATUS;
You will instantly get all your tables with the row count (which is the total) along with plenty of extra information if you want.
如果需要,您将立即获得所有带有行数(即总数)的表格以及大量额外信息。
回答by Denis de Bernardy
I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME will be slow when the table has lots of rows and lots of columns.
我遇到过一些文章,指出当表有很多行和很多列时,SELECT COUNT(*) FROM TABLE_NAME 会很慢。
That depends on the database. Some speed up counts, for instance by keeping track of whether rows are live or dead in the index, allowing for an index only scan to extract the number of rows. Others do not, and consequently require visiting the whole table and counting live rows one by one. Either will be slow for a huge table.
这取决于数据库。一些加速计数,例如通过跟踪索引中的行是活的还是死的,允许仅索引扫描以提取行数。其他人则不需要,因此需要访问整个表并一一计算活动行。对于一张大桌子来说,要么会很慢。
Note that you can generally extract a good estimate by using query optimization tools, table statistics, etc. In the case of PostgreSQL, for instance, you could parse the output of explain count(*) from yourtable
and get a reasonably good estimate of the number of rows. Which brings me to your second question.
请注意,您通常可以通过使用查询优化工具、表统计信息等来提取一个好的估计值。例如,在 PostgreSQL 的情况下,您可以解析 的输出explain count(*) from yourtable
并获得合理的行数估计值。这让我想到你的第二个问题。
I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACT count of the number of rows of a table?
我有一个可能包含数十亿行的表 [它大约有 15 列]。有没有更好的方法来获得表行数的精确计数?
Seriously? :-) You really mean the exactcount from a table with billions of rows? Are you really sure? :-)
严重地?:-) 你真的是指一个有数十亿行的表的准确计数吗?你真的确定吗?:-)
If you reallydo, you could keep a trace of the total using triggers, but mind concurrency and deadlocks if you do.
如果你真的这样做了,你可以使用触发器跟踪总数,但如果你这样做,请注意并发和死锁。
回答by Jesse Webb
Is there a better way to get the EXACT count of the number of rows of a table?
有没有更好的方法来获得表行数的精确计数?
To answer your question simply, No.
简单回答您的问题,否。
If you need a DBMS independent way of doing this, the fastestway will always be:
如果您需要一种独立于 DBMS 的方法来执行此操作,最快的方法始终是:
SELECT COUNT(*) FROM TableName
Some DBMS vendors may have quicker ways which will work for their systems only. Some of these options are already posted in other answers.
一些 DBMS 供应商可能有更快的方法,这些方法只适用于他们的系统。其中一些选项已经发布在其他答案中。
COUNT(*)
should be optimized by the DBMS (at least any PROD worthy DB) anyway, so don't try to bypass their optimizations.
COUNT(*)
无论如何都应该由 DBMS 优化(至少是任何值得 PROD 的数据库),所以不要试图绕过他们的优化。
On a side note:
I am sure many of your other queries also take a long time to finish because of your table size. Any performance concerns should probably be addressed by thinking about your schema design with speed in mind. I realize you said that it is not an option to change but it might turn out that 10+ minute queries aren't an option either. 3rd NF is not always the best approach when you need speed, and sometimes data can be partitioned in several tables if the records don't haveto be stored together. Something to think about...
附带说明:
我相信您的许多其他查询也需要很长时间才能完成,因为您的表大小。任何性能问题都应该通过在考虑速度的情况下考虑架构设计来解决。我意识到你说过这不是改变的选项,但结果可能是 10 分钟以上的查询也不是一个选项。3 NF并不总是最好的方法,当你需要的速度,有时数据可以在多个表中,如果记录不就被划分具有存储在一起。有什么要考虑的...
回答by jams
You can try this sp_spaceused (Transact-SQL)
你可以试试这个sp_spaceused (Transact-SQL)
Displays the number of rows, disk space reserved, and disk space used by a table, indexed view, or Service Broker queue in the current database, or displays the disk space reserved and used by the whole database.
显示当前数据库中某个表、索引视图或Service Broker 队列的行数、保留的磁盘空间和使用的磁盘空间,或显示整个数据库保留和使用的磁盘空间。
回答by JakeJ
I got this script from another StackOverflow question/answer:
我从另一个 StackOverflow 问题/答案中得到了这个脚本:
SELECT SUM(p.rows) FROM sys.partitions AS p
INNER JOIN sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
WHERE t.name = N'YourTableNameHere'
AND s.name = N'dbo'
AND p.index_id IN (0,1);
My table has 500 million records and the above returns in less than 1ms. Meanwhile,
我的表有 5 亿条记录,上面的返回时间不到 1 毫秒。同时,
SELECT COUNT(id) FROM MyTable
takes a full 39 minutes, 52 seconds!
耗时整整 39 分 52 秒!
They yield the exact same number of rows (in my case, exactly 519326012).
它们产生完全相同的行数(在我的例子中,正好是 519326012)。
I do not know if that would always be the case.
我不知道是否会一直如此。
回答by Alireza Maddah
If SQL Server edition is 2005/2008, you can use DMVs to calculate the row count in a table:
如果 SQL Server 版本是 2005/2008,则可以使用 DMV 来计算表中的行数:
-- Shows all user tables and row counts for the current database
-- Remove is_ms_shipped = 0 check to include system objects
-- i.index_id < 2 indicates clustered index (1) or hash table (0)
SELECT o.name,
ddps.row_count
FROM sys.indexes AS i
INNER JOIN sys.objects AS o ON i.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.dm_db_partition_stats AS ddps ON i.OBJECT_ID = ddps.OBJECT_ID
AND i.index_id = ddps.index_id
WHERE i.index_id < 2
AND o.is_ms_shipped = 0
ORDER BY o.NAME
For SQL Server 2000 database engine, sysindexes will work, but it is strongly advised to avoid using it in future editions of SQL Server as it may be removed in the near future.
对于 SQL Server 2000 数据库引擎,sysindexes 将起作用,但强烈建议避免在 SQL Server 的未来版本中使用它,因为它可能在不久的将来被删除。
Sample code taken from: How To Get Table Row Counts Quickly And Painlessly
回答by Mainsh S
I use
我用
select /*+ parallel(a) */ count(1) from table_name a;
回答by john rains
I'm nowhere near as expert as others who have answered but I was having an issue with a procedure I was using to select a random row from a table (not overly relevant) but I needed to know the number of rows in my reference table to calculate the random index. Using the traditional Count(*) or Count(1) work but I was occasionally getting up to 2 seconds for my query to run. So instead (for my table named 'tbl_HighOrder') I am using:
我远不及其他回答的专家那么专业,但我在使用从表中选择随机行的程序时遇到问题(不是过于相关),但我需要知道参考表中的行数计算随机指数。使用传统的 Count(*) 或 Count(1) 可以工作,但我的查询有时需要长达 2 秒才能运行。所以相反(对于我的名为 'tbl_HighOrder' 的表)我使用:
Declare @max int
Select @max = Row_Count
From sys.dm_db_partition_stats
Where Object_Name(Object_Id) = 'tbl_HighOrder'
It works great and query times in Management Studio are zero.
它运行良好,Management Studio 中的查询时间为零。
回答by Kaliyug Antagonist
Well, late by 5 years and unsure if it helps :
好吧,晚了 5 年,不确定它是否有帮助:
I was trying to count the no. of rows in a SQL Server table using MS SQL Server Management Studioand ran into some overflow error, then I used the below :
我试图数数没有。使用MS SQL Server Management Studio的 SQL Server 表中的行数并遇到一些溢出错误,然后我使用了以下内容:
select count_big(1) FROM [dbname].[dbo].[FactSampleValue];
select count_big(1) FROM [dbname].[dbo].[FactSampleValue];
The result :
结果 :
24296650578 rows
24296650578 行