SQL Server Int 或 BigInt 数据库表 ID

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2124631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:11:34  来源:igfitidea点击:

SQL Server Int or BigInt database table Ids

sqlsql-server

提问by Rob Packwood

I am writing a new program and it will require a database (SQL Server 2008). Everything I am running now for the system is 64-bit, which brings me to this question. For all of the Id columns in various tables, should I make them all INT or BIGINT? I doubt the system will ever surpass the INT range but it is a possibility within some of the larger financial tables I suppose. It seems like INT is the standard though...

我正在编写一个新程序,它将需要一个数据库(SQL Server 2008)。我现在为系统运行的所有东西都是 64 位的,这让我想到了这个问题。对于各种表中的所有 Id 列,我应该将它们全部设为 INT 还是 BIGINT?我怀疑该系统是否会超过 INT 范围,但我认为在一些较大的财务表中是有可能的。虽然 INT 似乎是标准...

回答by marc_s

OK, let's do a quick math recap:

好的,让我们快速回顾一下数学:

  • INT is 32-bit and gives you basically 4 billion values - if you only count the values larger than zero, it's still 2 billion. Do you have this many employees? Customers? Products in stock? Orders in the lifetime of your company? REALLY?

  • BIGINT goes way way way beyond that. Do you REALLY need that?? REALLY?? If you're an astronomer, or into particle physics - maybe. An average Line of Business user? I strongly doubt it

  • INT 是 32 位的,基本上可以给你 40 亿个值——如果你只计算大于零的值,它仍然是 20 亿。你们有这么多员工吗?顾客?产品有库存吗?贵公司生命周期内的订单?真的吗?

  • BIGINT 远不止于此。你真的需要那个吗??真的吗??如果你是天文学家,或者是粒子物理学——也许吧。一个普通的业务线用户?我强烈怀疑

Imagine you have a table with - say - 10 million rows (orders for your company). Let's say, you have an Orders table, and that OrderID which you made a BIGINT is referenced by 5 other tables, and used in 5 non-clustered indices on your Orders table - not overdone, I think, right?

想象一下,您有一个包含 - 比如说 - 1000 万行(贵公司的订单)的表。比方说,您有一个 Orders 表,并且您创建 BIGINT 的 OrderID 被其他 5 个表引用,并用于 Orders 表的 5 个非聚集索引中 - 我认为并不过分,对吧?

10 million rows, by 5 tables plus 5 non-clustered indices, that's 100 million instances where you are using 8 bytes each instead of 4 bytes - 400 million bytes = 400 MB. A total waste... you'll need more data and index pages, your SQL Server will have to read more pages from disk and cache more pages.... that's not beneficial for your performance - plain and simple.

1000 万行,由 5 个表加上 5 个非聚集索引,即 1 亿个实例,其中每个实例使用 8 个字节而不是 4 个字节 - 4 亿字节 = 400 MB。完全浪费...您将需要更多数据和索引页面,您的 SQL Server 将不得不从磁盘读取更多页面并缓存更多页面....这对您的性能没有好处 - 简单明了。

PLUS: What most programmer's don't think about: yes, disk space it dirt cheap. But that wasted space is also relevant in your SQL Server RAM memory and your database cache - and that space is not dirt cheap!

PLUS:大多数程序员没有想到的:是的,磁盘空间非常便宜。但是,浪费的空间也与您的 SQL Server RAM 内存和数据库缓存相关 - 而且该空间并不便宜!

So to make a very long post short: use the smallest type of INT that really suits your need; if you have 10-20 distinct values to handle - use TINYINT. If you need an order table, I believe INT should be PLENTY ENOUGH- BIGINT is only a waste of space.

因此,要使一篇很长的文章简短:使用真正适合您需要的最小类型的 INT;如果您有 10-20 个不同的值要处理 - 使用 TINYINT。如果您需要订单表,我相信 INT 应该足够- BIGINT 只是浪费空间。

Plus: should any of your tables really ever get close to reaching 2 or 4 billion rows, you'll still have plenty of time to upgrade your table to a BIGINT ID, if that's really needed.......

另外:如果你的任何表真的接近达到 2 或 40 亿行,你仍然有足够的时间将你的表升级到 BIGINT ID,如果真的需要的话......

回答by Aaronaught

You should use the smallest data type that makes sense for the table in question. That includes using smallintor even tinyintif there are few enough rows.

您应该使用对相关表有意义的最小数据类型。这包括使用smallint或即使tinyint行数足够少。

You'll save space on both data and indexes and get better index performance. Using a bigintwhen all you need is a smallintis similar to using a varchar(4000)when all you need is a varchar(50).

您将节省数据和索引的空间并获得更好的索引性能。使用 a bigintwhen all you need is asmallint类似于使用 a varchar(4000)when all you need is a varchar(50)

Even if the machine's native word size is 64 bits, that only means that 64-bit CPU operations won't be any slowerthan 32-bit operations. Most of the time, they also won't be faster, they'll be the same. But most databases are not going to be CPU bound anyway, they'll be I/O bound and to a lesser extent memory-bound, so a 50%-90% smaller data size is a Very Good Thing when you need to perform an index scan over 200 million rows.

即使机器的本机字大小是 64 位,也仅意味着 64 位 CPU 操作不会比 32 位操作。大多数时候,它们也不会更快,它们将是相同的。但是大多数数据库无论如何都不会受 CPU 限制,它们将受 I/O 限制,并且在较小程度上受内存限制,因此当您需要执行以下操作时,数据大小减少 50%-90% 是一件非常好的事情索引扫描超过 2 亿行。

回答by Rick B.

Here is an article with some real answers on performance... I prefer to answer questions with hard numbers if possible... If you click the following link at least up to a million records you will find a negligible difference in disk usage....

这是一篇关于性能的真实答案的文章......如果可能的话,我更喜欢用硬数字回答问题......如果您单击以下链接至少多达一百万条记录,您会发现磁盘使用量的差异可以忽略不计.. ..

http://www.sqlservercentral.com/articles/Performance+Tuning/2753/

http://www.sqlservercentral.com/articles/Performance+Tuning/2753/

Personally I do feel that using the appropriate ID size is important,but also consider the fact that you may have a table that has a ton of activity over time. It is not that your storing a massive amount of data, but that the key value has grown due to the nature of being auto-incremented (deletes and inserts occurring over time).

就我个人而言,我确实认为使用适当的 ID 大小很重要,但也要考虑这样一个事实,即随着时间的推移,您可能有一张有大量活动的表。并不是您存储了大量数据,而是键值由于自动递增(随时间发生删除和插入)的性质而增长。

Consider a file repository on a community site, or the id of the user comments on a community site multi-tenant application.

考虑社区站点上的文件存储库,或者社区站点多租户应用程序上用户评论的 ID。

I understand that most developers are building systems that will never touch millions of records, but it is important to note that there are reasons that a bigint is required, and I am still not convinced that when you are designing a schema that you do not know the potential growth for that you should not attempt to anticipate the future and consider using a bigint if you feel that the potential is there to exceed the max value of int as the id value grows.

我知道大多数开发人员正在构建永远不会触及数百万条记录的系统,但重要的是要注意需要 bigint 是有原因的,而且我仍然不相信当你设计一个你不知道的架构时如果您认为随着 id 值的增长,可能会超过 int 的最大值,那么您不应该尝试预测未来并考虑使用 bigint 的潜在增长。

回答by user2438530

Other people already gave compelling answers for 32-bit IDs.

其他人已经对 32 位 ID 给出了令人信服的答案。

For some applications 64-bit IDs do make more sense.

对于某些应用程序,64 位 ID 确实更有意义。

If you want to guarantee that IDs are unique across a cluster of databases - 63-bits for IDs can be very convenient. With 32 bits it's very difficult to distribute generation of IDs across servers in a cluster; or across data centers. While with 64 bits you have enough room to play with that you can conveniently generate IDs across servers without locking and still guarantee uniqueness.

如果您想保证 ID 在数据库集群中是唯一的 - 63 位的 ID 可能非常方便。使用 32 位时,很难在集群中的服务器之间分配 ID 的生成;或跨数据中心。虽然使用 64 位,您有足够的空间来使用,您可以方便地跨服务器生成 ID,而无需锁定,并且仍然保证唯一性。

For example see Twitter Snowflake, and Instagram Engineering's blog post on "Sharding & IDs at Instagram". Both provide good reasons why 63 or 64 bits make more sense for their IDs than 32-bit counters.

例如,请参阅Twitter SnowflakeInstagram 工程在“Instagram 上的分片和 ID”上的博客文章。两者都提供了为什么 63 位或 64 位比 32 位计数器更适合它们的 ID 的充分理由。

回答by gbn

The alignment of 32 bit numbers with x86 architecture or 64 bit with x64 architecture is called data structure alignment

x86 架构的 32 位数字对齐或 x64 架构的 64 位数字对齐称为数据结构对齐

This has no meaning for data in a database because here it's things disk space, data cache and table/index architecture that affect performance (as mentioned in other answers).

这对数据库中的数据没有意义,因为这里是影响性能的磁盘空间、数据缓存和表/索引架构(如其他答案中所述)。

Remember, it's not the CPU accessing the data as such. It's the DB engine code (which may be aligned, but who cares?) that runs on the CPU and manipulates your data. When/if your data goes through the CPU it certainly won't be in the same on-disk structures.

请记住,这不是 CPU 访问数据本身。它是在 CPU 上运行并操作数据的数据库引擎代码(可能是对齐的,但谁在乎?)。当/如果您的数据通过 CPU,它肯定不会处于相同的磁盘结构中。

回答by AdaTheDev

You should judge each table individually as to what datatype would meet the needs for each one. If an INTEGER would meet the needs of a particular table, use that. If a SMALLINT would be sufficient, use that. Use the datatype that will last, without being excessive.

您应该单独判断每个表,以确定哪种数据类型可以满足每个表的需要。如果 INTEGER 可以满足特定表的需要,请使用它。如果 SMALLINT 就足够了,请使用它。使用将持续的数据类型,但不要过度。

回答by Skyline

The first answer is the naive answer for anyone not working with TB size databases or tables with constant and high volume inserts. In any decent sized database you will run into problems with INT at some stage in its lifetime. Use BIGINT if you have to as it will save a lot of hassle further down the line. I have seen companies hit the INT issue after only a year of data and where reseeding was not an option it caused massive downtime. Also in long running systems (10 years+) where the system was not expected to still be used it has been hit even with moderate sized databases that purge old data. It is much better to use GUID in most cases where large amounts of data are expected but barring that use BIGINT if required.

对于不使用 TB 大小的数据库或具有恒定和高容量插入的表的任何人来说,第一个答案是天真的答案。在任何体面大小的数据库中,您都会在 INT 生命周期的某个阶段遇到问题。如果必须,请使用 BIGINT,因为它会在后续过程中省去很多麻烦。我已经看到公司在仅获得一年的数据后就遇到了 INT 问题,并且在无法重新播种的情况下,它导致了大量停机。同样在长时间运行的系统(10 年以上)中,即使使用清除旧数据的中等大小的数据库,系统也不会被使用。在大多数需要大量数据的情况下使用 GUID 会更好,但如果需要,除非使用 BIGINT。