在它变得丑陋之前,我可以在 Sql server 表中存储多少条记录?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2788300/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How many records can i store in a Sql server table before it's getting ugly?
提问by Michel
i've been asked to do some performance tests for a new system. It is only just running with a few client, but as they expect to grow, these are the numbers i work with for my test:
我被要求为一个新系统做一些性能测试。它仅与少数客户端一起运行,但随着他们期望增长,这些是我用于测试的数字:
200 clients, 4 years of data, and the data changes per.... 5 minutes. So for every 5 minutes for every client there is 1 record. That means 365*24*12 = 105.000 records per client per year, that means 80 milion records for my test. It has one FK to another table, one PK (uniqueidentifier) and one index on the clientID.
200 个客户端,4 年的数据,数据每...... 5 分钟变化一次。因此,对于每个客户端,每 5 分钟就有 1 条记录。这意味着每个客户每年 365*24*12 = 105.000 条记录,这意味着我的测试有 8000 万条记录。它有一个到另一个表的 FK、一个 PK(唯一标识符)和一个关于 clientID 的索引。
Is this something SqlServer laughs about because it isn't scaring him, is this getting too much for one quad core 8 GB machine, is this on the edge, or.....
这是 SqlServer 会笑的东西,因为它并没有吓到他,这对于一台四核 8 GB 机器来说是不是太多了,这是在边缘,还是.....
Has anybody had any experience with these kind of numbers?
有没有人对这些数字有任何经验?
回答by TomTom
Field PK should be as small as possible and not be random - GUID sucks here. The main problems are:
现场 PK 应该尽可能小而不是随机的 - GUID 在这里很糟糕。主要问题是:
- The PK is used in all foreign keys to reference the row, so a large PK uses more space ?= more IO.
- A random PK means inserts happen all over the place = many page splits = inefficient index usage.
- PK 用于所有外键以引用行,因此大 PK 使用更多空间?= 更多 IO。
- 随机 PK 意味着插入发生在整个地方 = 许多页面拆分 = 索引使用效率低下。
How bad is that? I know in some scenarios you lose 80% speed there.
那有多糟糕?我知道在某些情况下你会失去 80% 的速度。
Otherwise - no problem. I have a table in excess to 800 million rows, and things are super fast there ;) Naturally you need to have decent queries, decent indices and obviously that does not run on a single 5400 RPM green hard disc to be efficient - but given proper IO and not stupid queries and some decent indices, SQL does not bulk on a couple of billion rows.
否则 - 没问题。我有一个超过 8 亿行的表,那里的事情非常快;) 当然,你需要有合适的查询、合适的索引,而且显然不能在单个 5400 RPM 绿色硬盘上运行才能高效 - 但要适当IO 而不是愚蠢的查询和一些不错的索引,SQL 不会批量处理几十亿行。
So, while "it depends", the generic answer is that large tables are not a problem... ...unless you do MASS deletes. Deleting half the table will be a HUGE transaction, which is why partitioning is nice for stuff like accounting - one partition table per year means I can get rid of a year data without a DELETE statement ;)
因此,虽然“视情况而定”,但一般的答案是大表不是问题... ...除非您进行 MASS 删除。删除一半的表将是一个巨大的事务,这就是为什么分区对于像会计这样的东西很好 - 每年一个分区表意味着我可以在没有 DELETE 语句的情况下删除一年的数据;)
回答by Nick Craver
The software can handle it, can your server? Well, it depends.
该软件可以处理它,您的服务器可以吗?嗯,这取决于。
Are you just archiving it? Sure get a hundred billion rows, it doesn't care, the performance problems come in then you're querying the data. The larger it gets, the more space you need in terms of total storage and for operations (deletes, rollback segments, etc) on that data, preferably in memory but on a fast tempdb drive if not.
你只是存档吗?当然得到一千亿行,它不在乎,性能问题来了,然后你查询数据。它越大,就总存储和对该数据的操作(删除、回滚段等)而言,您需要的空间越多,最好在内存中,但如果不是,则在快速 tempdb 驱动器上。
What matters more to a SQL server than processor on large data sets (though processor certainly affects the timeit takes, not the threshold of query/data it can handle) is memory and space (both HD and RAM since it'll overflow to TempDB for large operations), this is speaking in terms of capacity. For performanceyou need disk IO, memory and processor power all together.
对于 SQL 服务器来说,比大数据集上的处理器更重要的是内存和空间(HD 和 RAM,因为它会溢出到 TempDB)(尽管处理器肯定会影响它花费的时间,而不是它可以处理的查询/数据的阈值)对于大型操作),这是在容量方面。为了提高性能,您需要同时具备磁盘 IO、内存和处理器能力。
The short answer to can it handle it yes, provided you have enough space. Does it handle it fast enough? That's depends on what kind of queries you're running and how much performance matters.
如果你有足够的空间,它能否处理它的简短回答是。它处理得足够快吗?这取决于您正在运行的查询类型以及性能的重要性。
One last thing, don't forget to take a look at other questions here on optimizing large tables.
最后一件事,不要忘记看看这里关于优化大表的其他问题。
回答by Dave Markle
SQL Server will have no trouble storing this many records.
SQL Server 可以轻松存储这么多记录。
If you have designed your indexes properly, and your database is properly normalized, you will have absolutely no problem accessing an arbitrary number of records. Oftentimes, people make poor design decisions early on when their database has no information in it, and you never know about it, because everything is fast for small "n".
如果您正确设计了索引,并且您的数据库已正确规范化,则访问任意数量的记录绝对没有问题。通常,当他们的数据库中没有信息时,人们会在早期做出糟糕的设计决策,而您永远不知道它,因为对于小 "n"来说一切都很快。
So while I will say that SQL Server can handle what you're doing, I would also say that now would be a good time to sit back and take a look at how your queries are performingwith SQL Server Profiler. Is everything still fast? Do you see excessive scanning or hashing in your frequent queries which is leading to degraded performance? If so, now's the time to analyze and fix those issues.
因此,虽然我会说 SQL Server 可以处理您正在做的事情,但我也想说现在是坐下来看看您的查询如何使用 SQL Server Profiler执行的好时机。一切还是那么快吗?您是否在频繁的查询中看到过度扫描或散列导致性能下降?如果是这样,现在是分析和解决这些问题的时候了。
As an aside, people really like to think of size limitations based on number of rows and columns. Try to go a step further and talk about bytes, because ultimately, bytes are what are being scanned in a reporting query, and bytes are what are being stored to disk.
顺便说一句,人们真的很喜欢根据行数和列数来考虑大小限制。尝试更进一步讨论bytes,因为最终, bytes 是在报告查询中扫描的内容,而 bytes 是存储到磁盘的内容。
回答by hakan
Too many really. I'm responsible for a web site which has 2 million registered users.
真的太多了。我负责一个拥有 200 万注册用户的网站。
Some of our tables has more than 100 million records and we can achieve great performance with 4 million daily page views, but I must admit caching with a good architecture is the main reason that things aren't getting ugly.
我们的一些表有超过 1 亿条记录,我们可以通过每天 400 万的页面浏览量实现出色的性能,但我必须承认,使用良好架构进行缓存是事情没有变得丑陋的主要原因。
回答by KM.
If you are after ultimate high performance, I'd design the PK to not be a uniqueidentifier. If you need to merge data sets, I'd go with an INT IDENTITY + SMALLINT (or even a tinyint) to determine the originating location. You don't say much about your design, but there are issues trying to use uniqueidentifier as a clustered index.
如果您追求终极高性能,我会将 PK 设计为不是唯一标识符。如果您需要合并数据集,我会使用 INT IDENTITY + SMALLINT(甚至是 tinyint)来确定原始位置。您对设计的描述不多,但在尝试使用 uniqueidentifier 作为聚集索引时会出现问题。
Given the proper server hardware, most decent designs will do just fine. Don't plan on running anything except the OS and SQL Server on the server. The main concern is RAM, for best performance, you'll need enough RAM for the entire database, indicies, etc., and that is beyond what the OS will use up. I've even seen massive servers help bad designs run very good.
给定合适的服务器硬件,大多数体面的设计都可以。除了服务器上的操作系统和 SQL Server 之外,不要计划运行任何东西。主要关注的是 RAM,为了获得最佳性能,您需要为整个数据库、索引等提供足够的 RAM,这超出了操作系统的消耗。我什至看到大型服务器帮助糟糕的设计运行得非常好。
回答by HLGEM
SQL Server can handle terrabytes worth of data. The kicker is that you have design correctly and have the right equipment mix. You might need partitioning for instance. You definitely do need to think about every millisecond of performance on every query and avoid poorly performing designs and querying techniques like EAV tables and correlated subqueries and cursors and "like '%sometext%'".
SQL Server 可以处理 TB 级的数据。关键是你的设计正确,设备组合正确。例如,您可能需要分区。您确实需要考虑每个查询的每一毫秒性能,并避免性能不佳的设计和查询技术,例如 EAV 表、相关子查询和游标以及“像 '%sometext%'”。
If you expect your datbase to be that large, then buy and read cover to cover a book on performance tuning before starting the design. Bad design kills database performance and it is extremely hard to correct once you have 80,000,000 records.
如果您希望您的数据库有那么大,那么在开始设计之前,请购买并阅读一本关于性能调优的书。糟糕的设计会降低数据库性能,一旦拥有 80,000,000 条记录,就很难纠正。
I also suggest you find a dba with experience with high performance, high volume databases. This is a whole new game design wise and it needs to be thoguht out from the beginning.
我还建议您找一位在高性能、大容量数据库方面有经验的 dba。这是一个全新的游戏设计,需要从一开始就考虑清楚。
Good for you for doing this kind of testing now before the system has that number of records.
在系统拥有那么多记录之前,现在就进行此类测试对您来说是件好事。
回答by Amy B
Even MS Access can laugh at a half million row table (depending on row size).
甚至 MS Access 也可以嘲笑五十万行表(取决于行大小)。
If you don't have any queries to profile, think of the table as a file. The rows aren't the important number compared to the sp_spaceused
.
如果您没有任何要分析的查询,请将表视为文件。与sp_spaceused
.
If you do have some queries, think of the table as a data structure. How can the query be fulfilled with the minimum amount of IO. Use the query plan, and SET STATISTICS IO ON
如果您确实有一些查询,请将表视为一种数据结构。如何用最少的 IO 完成查询。使用查询计划,以及SET STATISTICS IO ON