字符串作为 SQL 数据库中的主键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/517579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:02:45  来源:igfitidea点击:

Strings as Primary Keys in SQL Database

sqldatabasedatabase-designstringprimary-key

提问by mainstringargs

I am not very familiar with databases and the theories behind how they work. Is it any slower from a performance standpoint (inserting/updating/querying) to use Strings for Primary Keys than integers?

我不太熟悉数据库及其工作原理。从性能的角度来看(插入/更新/查询),使用字符串作为主键是否比整数慢?

回答by kemiller2002

Technically yes, but if a string makes sense to be the primary key then you should probably use it. This all depends on the size of the table you're making it for and the length of the string that is going to be the primary key (longer strings == harder to compare). I wouldn't necessarily use a string for a table that has millions of rows, but the amount of performance slowdown you'll get by using a string on smaller tables will be minuscule to the headaches that you can have by having an integer that doesn't mean anything in relation to the data.

从技术上讲是的,但如果一个字符串作为主键有意义,那么你应该使用它。这一切都取决于您为其制作的表的大小以及将成为主键的字符串的长度(更长的字符串 == 更难比较)。对于具有数百万行的表,我不一定会使用字符串,但是在较小的表上使用字符串所导致的性能下降量对于使用整数所带来的麻烦来说是微不足道的' 与数据无关。

回答by Jeff Martin

Another issue with using Strings as a primary key is that because the index is constantly put into sequential order, when a new key is created that would be in the middle of the order the index has to be resequenced... if you use an auto number integer, the new key is just added to the end of the index.

使用字符串作为主键的另一个问题是,因为索引不断按顺序排列,当创建一个位于顺序中间的新键时,索引必须重新排序......如果您使用自动number 整数,新键只是添加到索引的末尾。

回答by Mark Thompson

Inserts to a table having a clustered index where the insertion occurs in the middle of the sequence DOES NOT cause the index to be rewritten. It does not cause the pages comprising the data to be rewritten. If there is room on the page where the row will go, then it is placed in that page. The single page will be reformatted to place the row in the right place in the page. When the page is full, a page split will happen, with half of the rows on the page going to one page, and half going on the other. The pages are then relinked into the linked list of pages that comprise a tables data that has the clustered index. At most, you will end up writing 2 pages of database.

插入到具有聚集索引的表中,其中插入发生在序列的中间不会导致索引被重写。它不会导致包含数据的页面被重写。如果该行所在的页面上有空间,则将其放置在该页面中。将重新格式化单个页面以将行放置在页面中的正确位置。当页面已满时,将发生页面拆分,页面上的一半行转到一页,另一半转到另一页。然后将这些页面重新链接到包含具有聚集索引的表数据的页面的链接列表中。最多,您将最终编写 2 页数据库。

回答by HLGEM

Strings are slower in joins and in real life they are very rarely really unique (even when they are supposed to be). The only advantage is that they can reduce the number of joins if you are joining to the primary table only to get the name. However, strings are also often subject to change thus creating the problem of having to fix all related records when the company name changes or the person gets married. This can be a huge performance hit and if all tables that should be related somehow are not related (this happens more often than you think), then you might have data mismatches as well. An integer that will never change through the life of the record is a far safer choice from a data integrity standpoint as well as from a performance standpoint. Natural keys are usually not so good for maintenance of the data.

字符串在连接中速度较慢,在现实生活中它们很少真正独特(即使它们应该是)。唯一的优点是,如果您加入主表只是为了获取名称,它们可以减少连接次数。但是,字符串也经常会发生变化,因此会产生在公司名称更改或此人结婚时必须修复所有相关记录的问题。这可能是一个巨大的性能损失,如果所有应该以某种方式相关的表都不相关(这种情况发生的频率比您想象的要高),那么您也可能会遇到数据不匹配的情况。从数据完整性的角度和性能的角度来看,在记录的生命周期中永远不会改变的整数是更安全的选择。自然键通常不太适合维护数据。

I also want to point out that the best of both worlds is often to use an autoincrementing key (or in some specialized cases, a GUID) as the PK and then put a unique index on the natural key. You get the faster joins, you don;t get duplicate records, and you don't have to update a million child records because a company name changed.

我还想指出,两全其美的方法通常是使用自动递增键(或在某些特殊情况下,使用 GUID)作为 PK,然后在自然键上放置唯一索引。您获得更快的联接,不会获得重复的记录,并且您不必因为公司名称更改而更新一百万条子记录。

回答by Al Katawazi

It doesn't matter what you use as a primary key so long as it is UNIQUE. If you care about speed or good database design use the int unless you plan on replicating data, then use a GUID.

只要它是唯一的,你使用什么作为主键并不重要。如果您关心速度或良好的数据库设计,除非您计划复制数据,否则请使用 int,然后使用 GUID。

If this is an access database or some tiny app then who really cares. I think the reason why most of us developers slap the old int or guid at the front is because projects have a way of growing on us, and you want to leave yourself the option to grow.

如果这是一个访问数据库或一些小应用程序,那么谁真正关心。我认为我们大多数开发人员之所以把旧的 int 或 guid 放在前面是因为项目有一种方式可以让我们成长,而你想让自己有成长的选择。

回答by Joel Coehoorn

Too many variables. It depends on the size of the table, the indexes, nature of the string key domain...

变数太多。这取决于表的大小、索引、字符串键域的性质......

Generally, integers will be faster. But will the difference be large enough to care? It's hard to say.

通常,整数会更快。但是差异会大到足以在意吗?很难说。

Also, what is your motivation for choosing strings? Numeric auto-increment keys are often so much easieras well. Is it semantics? Convenience? Replication/disconnected concerns? Your answer here could limit your options. This also brings to mind a third "hybrid" option you're forgetting: Guids.

另外,您选择琴弦的动机是什么?数字自动递增键通常也更容易。是语义吗?方便?复制/断开连接问题?您在此处的回答可能会限制您的选择。这也让人想起您忘记的第三个“混合”选项:Guids。

回答by Walter Mitty

Don't worry about performance until you have got a simple and sound design that agrees with the subject matter that the data describes and fits well with the intended use of the data. Then, if performance problems emerge, you can deal with them by tweaking the system.

在您获得与数据描述的主题一致并且非常适合数据的预期用途的简单而合理的设计之前,不要担心性能。然后,如果出现性能问题,您可以通过调整系统来处理它们。

In this case, it's almost always better to go with a string as a natural primary key, provide you can trust it. Don't worry if it's a string, as long as the string is reasonably short, say about 25 characters max. You won't pay a big price in terms of performance.

在这种情况下,使用字符串作为自然主键几乎总是更好,前提是您可以信任它。如果它是一个字符串,请不要担心,只要字符串相当短,最多可以说大约 25 个字符。你不会在性能方面付出很大的代价。

Do the data entry people or automatic data sources always provide a value for the supposed natural key, or is sometimes omitted? Is it occasionally wrong in the input data? If so, how are errors detected and corrected?

数据输入人员或自动数据源是否总是为假定的自然键提供值,或者有时会被省略?输入数据偶尔会出错吗?如果是这样,如何检测和纠正错误?

Are the programmers and interactive users who specify queries able to use the natural key to get what they want?

指定查询的程序员和交互式用户是否能够使用自然键来获得他们想要的东西?

If you can't trust the natural key, invent a surrogate. If you invent a surrogate, you might as well invent an integer. Then you have to worry about whther to conceal the surrogate from the user community. Some developers who didn't conceal the surrogate key came to regret it.

如果您不能信任自然键,请发明一个代理。如果你发明了一个代理,你也可以发明一个整数。然后您必须担心是否向用户社区隐藏代理。一些没有隐藏代理键的开发人员开始后悔了。

回答by Quassnoi

Indices imply lots of comparisons.

指数意味着很多比较。

Typically, strings are longer than integers and collation rules may be applied for comparison, so comparing strings is usually more computationally intensive task than comparing integers.

通常,字符串比整数长,并且可以应用整理规则进行比较,因此比较字符串通常比比较整数需要更多的计算。

Sometimes, though, it's faster to use a string as a primary key than to make an extra join with a string to numerical idtable.

但有时,使用字符串作为主键比与string to numerical id表进行额外连接更快。

回答by Yes - that Jake.

Yes, but unless you expect to have millions of rows, not using a string-based key because it's slower is usually "premature optimization." After all, strings are stored as big numbers while numeric keys are usually stored as smaller numbers.

是的,但除非您希望有数百万行,否则不使用基于字符串的键(因为它速度较慢)通常是“过早优化”。毕竟,字符串存储为大数字,而数字键通常存储为较小的数字。

One thing to watch out for, though, is if you have clustered indices on a any key and are doing large numbers of inserts that are non-sequential in the index. Every line written will cause the index to re-write. if you're doing batch inserts, this can really slow the process down.

不过,需要注意的一件事是,如果您在 any 键上有聚集索引,并且正在索引中进行大量非顺序插入。写入的每一行都会导致索引重新写入。如果您正在进行批量插入,这确实会减慢进程速度。

回答by Jatinder Singh

Two reasons to use integers for PK columns:

对 PK 列使用整数的两个原因:

  1. We can set identity for integer field which incremented automatically.

  2. When we create PKs, the db creates an index (Cluster or Non Cluster) which sorts the data before it's stored in the table. By using an identity on a PK, the optimizer need not check the sort order before saving a record. This improves performance on big tables.

  1. 我们可以为自动递增的整数字段设置标识。

  2. 当我们创建 PK 时,数据库会创建一个索引(集群或非集群),在数据存储到表中之前对数据进行排序。通过在 PK 上使用标识,优化器无需在保存记录之前检查排序顺序。这提高了大表的性能。