SQL 主键:整数与 varchar
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3162202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL primary key: integer vs varchar
提问by frabiacca
The team I'm working with decided to create a table with a varchar primary key. This table is referenced by another table on this primary key.
与我一起工作的团队决定创建一个带有 varchar 主键的表。此表被此主键上的另一个表引用。
I've the habit to create an integer primary key, following what I learnt at university. I've read that there is a performance boost using integer primary key.
根据我在大学学到的知识,我习惯于创建一个整数主键。我读过使用整数主键可以提高性能。
The matter is that I don't know any other reason for creating an integer primary key. Do you have any tips?
问题是我不知道创建整数主键的任何其他原因。你有什么建议吗?
采纳答案by Mark Byers
The primary key is supposed to represent the identity for the row and should not change over time.
主键应该代表行的身份,不应随时间变化。
I assume that the varchar is some sort of natural key - such as the name of the entity, an email address, or a serial number. If you use a natural key then it can sometimes happen that the key needs to change because for example:
我假设 varchar 是某种自然键 - 例如实体的名称、电子邮件地址或序列号。如果您使用自然键,那么有时可能会发生键需要更改的情况,例如:
- The data was incorrectly entered and needs to be fixed.
- The user changes their name or email address.
- The management suddenly decide that all customer reference numbers must be changed to another format for reasons that seem completely illogical to you, but they insist on making the change even after you explain the problems it will cause you.
- Maybe even a country or state decides to change the spelling of its name - very unlikely, but not impossible.
- 数据输入错误,需要修复。
- 用户更改其姓名或电子邮件地址。
- 管理层突然决定,出于对您来说完全不合逻辑的原因,必须将所有客户参考编号更改为另一种格式,但即使在您解释了这会给您带来的问题之后,他们仍坚持进行更改。
- 甚至一个国家或州也可能决定更改其名称的拼写 - 非常不可能,但并非不可能。
By using a surrogate key you avoid problems caused by having to change primary keys.
通过使用代理键,您可以避免因必须更改主键而导致的问题。
回答by Remus Rusanu
VARCHAR vs. INT doesn't tell much. What matter is the access pattern.
VARCHAR 与 INT 并没有说明太多。重要的是访问模式。
On absolute terms, a wider key will always be worse than a narrow key. The type carries absolutely no importance, is the width that matters. When compared with INT though, few types can beat INT in narrowness, so INT usually wins that argument just by the fact that is only 4 bytes wide.
就绝对而言,更宽的键总是比窄键更糟糕。类型绝对不重要,重要的是宽度。但是,与 INT 相比,很少有类型可以在窄度上击败 INT,因此 INT 通常仅凭借仅 4 个字节宽的事实就赢得了这个论点。
But what reallymatters is the choice of clusteredkey. Often confused with the primary key, the two represent different notions and are notrequired to overlap. Here is a more detailed discussion Should I design a table with a primary key of varchar or int?The choice of the clustered key is just about the most important decision in table design, and a mechanical application of an INT identity(1,1)
on it may be just the biggest mistake one can make. Here is where the question of access patterns comes in:
但真正重要的是集群键的选择。与主键经常被混淆,这两个代表不同的概念,并且不重叠必需的。这里有更详细的讨论我应该设计一个主键是 varchar 还是 int 的表?聚集键的选择几乎是表设计中最重要的决定,机械地应用INT identity(1,1)
它可能只是人们可能犯的最大错误。这是访问模式问题的来源:
- what are the most frequent interrogations on the table?
- what columns are projected?
- what predicates are applied?
- what ranges are searched?
- what joins are performed?
- what aggregations occur?
- how is the data inserted into the table?
- how is the data updated in the table?
- how is old data purged from the table, if ever?
- how many non-clustered indexes exist?
- how often are columns included in the NC indexes (key or leaf) are updated?
- 桌上最常见的审讯是什么?
- 投影了哪些列?
- 应用了哪些谓词?
- 搜索什么范围?
- 执行什么连接?
- 发生什么聚合?
- 数据是如何插入到表中的?
- 表中的数据是如何更新的?
- 如果有的话,如何从表中清除旧数据?
- 存在多少个非聚集索引?
- NC 索引(键或叶)中包含的列多久更新一次?
Overall, there are many access patterns that can be ruined by using an INT IDENTITY clustered key. So before jumping to apply a cookie cutter solution, perhaps a little bit of analysis is required...
总的来说,使用 INT IDENTITY 集群密钥可以破坏许多访问模式。因此,在开始应用千篇一律的解决方案之前,也许需要进行一些分析……
Some more general guidelines:
一些更一般的指导方针:
You see there are no Primary Key design guidelines, because the Primary key is not an issue of storage design but an issue of modelingand is entirely domain driven.
您会看到没有主键设计指南,因为主键不是存储设计问题而是建模问题,并且完全是域驱动的。
回答by onedaywhen
i was a bit disappointed because i've the habit to create an integer primary key (following what some teacher told me at the university). I've read a lot of documentation on the performance boost using integer primary key.
我有点失望,因为我有创建整数主键的习惯(按照大学里一些老师告诉我的)。我已经阅读了很多关于使用整数主键提高性能的文档。
There is a term for this: confirmation bias:
有一个术语:确认偏差:
"also called confirmatory bias or myside bias) is a tendency for people to favor information that confirms their preconceptions or hypotheses, independently of whether they are true. This results in people selectively collecting new evidence, interpreting evidence in a biased way, or selectively recalling information from memory."
“也称为证实性偏见或我的偏见)是人们倾向于支持证实他们的先入之见或假设的信息的倾向,而不管它们是否真实。这导致人们有选择地收集新证据,以有偏见的方式解释证据,或有选择地回忆记忆中的信息。”
Of course, your first reaction will be to say, "But that's not true!" Yeah, you would say that 'cos you're biased ;) [tongue firmly embedded in cheek]
当然,你的第一反应会说,“但那不是真的!” 是的,你会说'因为你有偏见;) [舌头牢牢嵌入脸颊]
Here's a classic example: say you had been told by your zoology professor that all swans are white and, sure enough, all swans you and your friends have ever encountered are white. Now let's say later in life a colleague expressed the opinion that perhaps there is such creature as a black swan. What?! That's not what your were taught. Your world is rocked! You immediately go out and conduct a swan survey and you count 1,000 white swans and zero black swans. Proof! If you'd found 10,000 white swans then the hypothesis 'All swans are white' would be ten times truer, right?
这是一个经典的例子:假设你的动物学教授告诉你所有的天鹅都是白色的,而且你和你的朋友遇到的所有天鹅都是白色的。现在让我们说在以后的生活中,一位同事表示,也许存在黑天鹅这样的生物。什么?!那不是你被教导的。你的世界震撼了!你立即出去进行天鹅调查,你数了 1000 只白天鹅和 0 只黑天鹅。证明!如果你发现了 10,000 只白天鹅,那么“所有的天鹅都是白色的”假设会更真实十倍,对吧?
A different approach would be to forget about white swans for the moment and try to seek out a black swan. Perhaps take a holiday by the sea in sunny Dawlish?
另一种方法是暂时忘记白天鹅,并尝试寻找黑天鹅。也许在阳光明媚的Dawlish海边度假?
I really don't mean to sound disrespectful; you admit to reading a lot about what you have been told and that indeed earns my respect. So here's a challenge: try to find cases where adding an integer column to a table is unnecessary.
我真的不是要听起来不尊重;你承认阅读了很多关于你被告知的内容,这确实赢得了我的尊重。所以这是一个挑战:尝试找出不需要向表中添加整数列的情况。
Here are some hints and spoilers: tables that are not referenced by other tables; single column 'all key' lookup tables; 'small' tables that aren't queried much :)
以下是一些提示和剧透: 其他表格未引用的表格;单列“所有键”查找表;查询不多的“小”表:)
Here are some other related topics you may like to investigate:
以下是您可能想调查的其他一些相关主题:
Does the word 'primary' in 'primary key' have much meaning or are all keys in a given table equal?
“primary key”中的“primary”一词是否有很多含义,或者给定表中的所有键都相等?
What are the qualities of a 'good' key? (e.g. should a key's values be immutable or is a stability 'good' enough?)
“好”钥匙的品质是什么?(例如,键的值应该是不可变的还是稳定性“足够好”?)
Is an integer column added to the table as an artifical key (perhpas because the available natural key is not 'good' enough) or as a surrogate key (perhaps to boost performance of an otherwise 'good' natural key)?
添加到表中的整数列是作为人工键(可能是因为可用的自然键不够“好”)还是作为代理键(可能是为了提高其他“好”自然键的性能)?
When a surrogate key is added to a table on performance grounds, is this for actual measured effect or merely for perceived effect (i.e. premature optimization)?
当基于性能原因将代理键添加到表中时,这是为了实际测量的效果还是仅用于感知效果(即过早优化)?
Should surrogate keys appear in the logical business model or are they for implementation only?
代理键应该出现在逻辑业务模型中还是仅用于实现?
Is it a good idea to always do something (e.g. add an integer column to a table) without engaging the brain each time? ;)
总是做一些事情(例如向表中添加一个整数列)而不每次都动脑子是个好主意吗?;)
[Disclaimer: I'm a natural key advocate and shun surrogates. For me they are like denormalization: you only do it when you have to, usually for a performance issue (specific and demonstrable), where the fault lies elsewhere (lousy SQL product version, logical design flaw that cannot be fixed at this time, etc). Surrogates should never appear in the logical business model. I sometimes need an artificial identifier and have even exposed them logical business models.]
[免责声明:我是一个天生的关键拥护者并避免代理人。对我来说,它们就像非规范化:你只在必须的时候才这样做,通常是为了性能问题(具体的和可证明的),故障出在其他地方(糟糕的 SQL 产品版本,目前无法修复的逻辑设计缺陷等) )。代理永远不应该出现在逻辑业务模型中。我有时需要一个人工标识符,甚至公开了它们的逻辑业务模型。]