SQL 返回 500 万条记录的查询需要多长时间?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9993761/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 15:08:35  来源:igfitidea点击:

How long should a query that returns 5 million records take?

sqlsql-serverdatabasesql-server-2008

提问by alimac83

I realise the answer should probably be 'as little time as possible' but I'm trying to learn how to optimise databases and I have no idea what an acceptable time is for my hardware.

我意识到答案可能应该是“尽可能少的时间”,但我正在尝试学习如何优化数据库,但我不知道我的硬件可接受的时间是多少。

For a start I'm using my local machine with a copy of sql server 2008 express. I have a dual-core processor, 2GB ram and a 64bit OS (if that makes a difference). I'm only using a simple table with about 6 varchar fields.

首先,我将本地机器与 sql server 2008 express 的副本一起使用。我有一个双核处理器、2GB 内存和一个 64 位操作系统(如果有区别的话)。我只使用一个包含大约 6 个 varchar 字段的简单表。

At first I queried the data without any indexing. This took a ridiculously long amount of time so I cancelled and added a clustered index (using the PK) to the table. This cut the time down to 1 minute 14 sec. I have no idea if this is the best I can get or whether I'm still able to cut this down even further?

起初我在没有任何索引的情况下查询数据。这花费了相当长的时间,所以我取消并在表中添加了一个聚集索引(使用 PK)。这将时间缩短到 1 分 14 秒。我不知道这是否是我能得到的最好的,或者我是否仍然能够进一步减少它?

Am I limited by my hardware or is there anything else I can do to my table/database/queries to get results faster?

我是否受到硬件的限制,或者我可以对我的表/数据库/查询做些什么来更快地获得结果?

FYI I'm only using a standard SELECT * FROM to retrieve my results.

仅供参考,我只使用标准的 SELECT * FROM 来检索我的结果。

Thanks!

谢谢!

EDIT: Just to clarify, I'm only doing this for testing purposes. I don't NEED to pull out all the data, I'm just using that as a consistent test to see if I can cut down the query times.

编辑:只是为了澄清,我这样做只是为了测试目的。我不需要提取所有数据,我只是将其用作一致的测试,看看是否可以减少查询时间。

I suppose what I'm asking is: Is there anything I can do to speed up the performance of my queries other than a) upgrading hardware and b) adding indexes (assuming the schema is already good)?

我想我要问的是:除了a)升级硬件和b)添加索引(假设模式已经很好)之外,我还能做些什么来加快查询的性能?

回答by VMAtm

I think you are asking the wrong question.

我认为你问错了问题。

First of all - why do you need so many articles at one time on the local machine? What do you want to do with them? I'm asking because I think you want to transfer this of data to somewhere, so you should be measuring how long it takes to transfer the data.

首先——为什么本地机器上一次需要这么多文章?你想对他们做什么?我问是因为我认为您想将这些数据传输到某个地方,因此您应该测量传输数据所需的时间。

Some advice:

一些忠告:

Your applications should notselect 5 million records at the time. Try to split your query and get the data in smaller sets.

您的应用程序当时不应选择 500 万条记录。尝试拆分您的查询并以较小的集合获取数据。

UPDATE:

更新:

Because you are doing this for testing, I suggest that you

因为你这样做是为了测试,我建议你

  1. Remove *from your query - it takes SQL server some time to resolve this.
  2. Put your data in temporary storage, try using VIEWor a temporary table for this.
  3. Use plan caching on your server
  1. *从您的查询中删除- SQL 服务器需要一些时间来解决此问题。
  2. 将您的数据放在临时存储中,尝试VIEW为此使用或 临时表。
  3. 在您的服务器上使用计划缓存

to improve performance. But even if you're just testing, I still don't understand why you would need such tests if your application would never use such a query. Testing just for the sake of testing is a bad use of time

以提高性能。但是即使您只是在测试,如果您的应用程序永远不会使用这样的查询,我仍然不明白为什么您需要这样的测试。仅仅为了测试而测试是对时间的错误利用

回答by rvphx

Look at the query execution plan. If your query is doing a table scan, it will obviously take a long time. The query execution plan can help you decide what kind of indexing you would need on the table. Also, creating table partitions can help sometimes in cases where the data is partitioned by a condition (usually date and time).

查看查询执行计划。如果您的查询正在进行表扫描,则显然需要很长时间。查询执行计划可以帮助您决定需要在表上建立什么样的索引。此外,在数据按条件(通常是日期和时间)进行分区的情况下,创建表分区有时会有所帮助。

回答by user824910

The best optimized way depends on the indexing strategy you choose. As many of the above answers, i too would say partitioning the table would help sometimes. And its not the best practise to query all the billion record in a single time frame. Will give you much bettered resuld if you could try to query partially with the iterations. you may check this link to clear the doubts on the minimum requirements for the Sql server 2008 Minimum H/W and S/W Requirements for Sql server 2008

最佳优化方式取决于您选择的索引策略。正如上面的许多答案一样,我也会说对表进行分区有时会有所帮助。在单个时间范围内查询所有十亿条记录并不是最佳做法。如果您可以尝试使用迭代部分查询,将为您提供更好的结果。您可以查看此链接以消除对 Sql server 2008最低硬件要求和 Sql server 2008 最低软件要求的疑虑

回答by Andy

I did 5.5 million in 20 seconds. That's taking over 100k schedules with different frequencies and forecasting them for the next 25 years. Just max scenario testing, but proves the speed you can achieve in a scheduling system as an example.

我在 20 秒内完成了 550 万个。这将占用超过 10 万个不同频率的时间表,并预测未来 25 年的时间表。只是最大场景测试,但证明了您可以在调度系统中实现的速度作为示例。

回答by Jayanth Kurup

When fecthing 5 million rows you are almost 100% going spool to tempdb. you should try to optimize your temp Db by adding additional files. if you have multiple drives on seperate disks you should split the table data into different ndf files located on seperate disks. parititioning wont help when querying all the data on the disk U can also use a query hint to force parrallelism MAXDOP this will increase the CPU utilization. Ensure that the columns contain few nulls as possible and rebuild ur indexes and stats

当处理 500 万行时,您几乎 100% 将后台打印到 tempdb。您应该尝试通过添加其他文件来优化您的临时数据库。如果您在单独的磁盘上有多个驱动器,您应该将表数据拆分为位于单独磁盘上的不同 ndf 文件。当查询磁盘上的所有数据时,分区无济于事 U 也可以使用查询提示来强制并行 MAXDOP 这将增加 CPU 利用率。确保列包含尽可能少的空值并重建您的索引和统计信息