database 数据库设计主键、ID 与字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2551776/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:40:10  来源:igfitidea点击:

Database Design Primay Key, ID vs String

databasedatabase-design

提问by Michal Ciechan

I am currently planning to develop a music streaming application. And i am wondering what would be better as a primary key in my tables on the server. An ID int or a Unique String.

我目前正计划开发一个音乐流媒体应用程序。我想知道在服务器上的表中作为主键会更好。ID int 或唯一字符串。

Methods 1:

方法一:

Songs Table: SongID(int), Title(string), *Artist**(string), Length(int), *Album**(string)

歌曲表: SongID(int)、Title(string)、*Artist**(string)、Length(int)、*Album**(string)

Genre Table Genre(string), Name(string)

流派表 流派(字符串),名称(字符串)

SongGenre: ***SongID****(int), ***Genre****(string)

SongGenre: ***SongID****(int), ***Genre****(string)

Method 2

方法二

Songs Table: SongID(int), Title(string), *ArtistID**(int), Length(int), *AlbumID**(int)

歌曲表: SongID(int)、Title(string)、*ArtistID**(int)、Length(int)、*AlbumID**(int)

Genre Table GenreID(int), Name(string)

流派表流派 ID(整数),名称(字符串)

SongGenre: ***SongID****(int), ***GenreID****(int)

歌曲流派:***歌曲ID****(int),***流派ID****(int)

Key: Bold= Primary Key, *Field** = Foreign Key

键:粗体= 主键,*字段** = 外键

I'm currently designing using method 2 as I believe it will speed up lookup performance and use less space as an int takes a lot less space then a string.

我目前正在使用方法 2 进行设计,因为我相信它会加快查找性能并使用更少的空间,因为 int 占用的空间比字符串少得多。

Is there any reason this isn't a good idea? Is there anything I should be aware of?

有什么理由这不是一个好主意吗?有什么我应该注意的吗?

采纳答案by Oded

You are doing the right thing - identity field should be numeric and not string based, both for space saving and for performance reasons (matching keys on strings is slower than matching on integers).

您正在做正确的事情 - 身份字段应该是数字而不是基于字符串,无论是为了节省空间还是出于性能原因(匹配字符串上的键比匹配整数慢)。

回答by Dave

Is there any reason this isn't a good idea? Is there anything I should be aware of?

有什么理由这不是一个好主意吗?有什么我应该注意的吗?

Yes. Integer IDs are very bad if you need to uniquely identify the same data outside of a single database. For example, if you have to copy the same data into another database system with potentially pre-existing data or you have a distributed database. The biggest thing to be aware of is that an integer like 7481has no meaning outside of that one database. If later on you need to grow that database, it may be impossible without surgically removing your data.

是的。如果您需要在单个数据库之外唯一标识相同的数据,则整数 ID 非常糟糕。例如,如果您必须将相同的数据复制到另一个可能预先存在数据的数据库系统中,或者您有一个分布式数据库。需要注意的最重要的事情是,7481在该数据库之外,像这样的整数没有任何意义。如果以后您需要扩展该数据库,则不通过手术删除数据可能是不可能的。

The other thing to keep in mind is that integer IDs aren't as flexible so they can't easily be used for exceptional cases. The designers of the Internet Protocol understood this and took precautions by allocating certain blocks of numbers as "special" in one way or another (broadcast IPs, private IPs, network IPs). But that was only possible because there's a protocolsurrounding the usage of those numbers. Many databases don't operate within such a well-defined protocol.

要记住的另一件事是整数 ID 没有那么灵活,因此它们不容易用于特殊情况。互联网协议的设计者明白这一点,并采取了预防措施,通过以一种或另一种方式(广播 IP、私有 IP、网络 IP)将某些数字块分配为“特殊”。但这是唯一可能的,因为有一个围绕这些数字的使用的协议。许多数据库不在这样一个定义明确的协议中运行。

FWIW, it's kind of like trying to decide if having a "strongly typed" programming paradigm is better than a "weakly/dynamically typed" programming paradigm. It will depend on what you need to do.

FWIW,这有点像试图确定“强类型”编程范式是否比“弱/动态类型”编程范式更好。这将取决于您需要做什么。

回答by michaelbn

From the software perspective the GUID is better as its unique globally.

从软件的角度来看,GUID 更好,因为它是全球唯一的。

Quotes from: Primary Keys: IDs versus GUIDs

引自:主键:ID 与 GUID

Using a GUID as a row identity value feels more natural-- and certainly more truly unique-- than a 32-bit integer. Database guru Joe Celko seems to agree. GUID primary keys are a natural fit for many development scenarios, such as replication, or when you need to generate primary keys outside the database. But it's still a question of balancing the tradeoffs between traditional 4-byte integer IDs and 16-byte GUIDs:

GUID Pros

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • Allows easy distribution of databases across multiple servers
  • You can generate IDs anywhere, instead of having to roundtrip to the database
  • Most replication scenarios require GUID columns anyway

GUID Cons

  • It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
  • Cumbersome to debug where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}'
  • The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

使用 GUID 作为行标识值感觉比 32 位整数更自然——当然也更真实独特。数据库大师 Joe Celko似乎同意。GUID 主键非常适合许多开发场景,例如复制,或者当您需要在数据库外生成主键时。但这仍然是平衡传统 4 字节整数 ID 和 16 字节 GUID 之间权衡的问题:

GUID 优点

  • 每个表、每个数据库、每个服务器都是唯一的
  • 允许轻松合并来自不同数据库的记录
  • 允许跨多个服务器轻松分发数据库
  • 您可以在任何地方生成 ID,而不必往返数据库
  • 大多数复制方案无论如何都需要 GUID 列

GUID 缺点

  • 它比传统的 4 字节索引值大 4 倍;如果您不小心,这可能会对性能和存储产生严重影响
  • 繁琐的调试 where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}'
  • 生成的 GUID 应该是部分顺序的以获得最佳性能(例如,SQL 2005 上的 newsequentialid())并启用聚集索引

回答by graffic

My recommendation is: use ids.

我的建议是:使用 ids。

You'll be able to rename that "Genre" with 20000 songs without breaking anything.

您将能够使用 20000 首歌曲重命名该“流派”而不会破坏任何内容。

The idea behind this is that the id identifies the row in the table. Whatever the row has is something that doesn't matters in this problem.

这背后的想法是 id 标识表中的行。无论该行有什么,在这个问题中都无关紧要。

回答by cletus

This is in large part a matter of personal preference.

这在很大程度上取决于个人喜好。

My personal opinion and practice is to always use integer keys and to always use surrogate rather than natural keys (so never use anything like social security number or the genre name directly).

我个人的意见和做法是始终使用整数键并始终使用代理而不是自然键(因此永远不要直接使用社会保险号或流派名称之类的东西)。

There are cases where an auto number field is not appropriate or does not scale. In these cases it can make sense to use a GUID, which can be a string in databases that do not have a native datatype for it.

在某些情况下,自动编号字段不合适或无法缩放。在这些情况下,使用 GUID 是有意义的,它可以是没有本地数据类型的数据库中的字符串。

回答by James Westgate

MSSQL can generate these id's for you when using an int (see IDENTITY keyword)

MSSQL 可以在使用 int 时为您生成这些 ID(请参阅 IDENTITY 关键字)