SQL:主键和索引到底是做什么的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1315481/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 03:18:30  来源:igfitidea点击:

SQL: what exactly do Primary Keys and Indexes do?

sqlphpmyadmin

提问by Mala

I've recently started developing my first serious application which uses a SQL database, and I'm using phpMyAdmin to set up the tables. There are a couple optional "features" I can give various columns, and I'm not entirely sure what they do:

我最近开始开发我的第一个使用 SQL 数据库的严肃应用程序,我正在使用 phpMyAdmin 来设置表。有几个可选的“功能”我可以提供各种列,但我不完全确定它们的作用:

  • Primary Key
  • Index
  • 首要的关键
  • 指数

I know what a PK is for and how to use it, but I guess my question with regards to that is why does one need one - how is it different from merely setting a column to "Unique", other than the fact that you can only have one PK? Is it just to let the programmer know that this value uniquely identifies the record? Or does it have some special properties too?

我知道 PK 的用途以及如何使用它,但我想我对此的问题是为什么需要一个 - 它与仅将一列设置为“唯一”有什么不同,除了您可以只有一个PK?只是为了让程序员知道这个值唯一标识了记录吗?或者它也有一些特殊的属性?

I have no idea what "Index" does - in fact, the only times I've ever seen it in use are (1) that my primary keys seem to be indexed, and (2) I heard that indexing is somehow related to performance; that you want indexed columns, but not too many. How does one decide which columns to index, and what exactly does it do?

我不知道“索引”是做什么的 - 事实上,我见过它在使用的唯一时间是(1)我的主键似乎被索引,以及(2)我听说索引与性能有某种关系; 你想要索引列,但不要太多。如何决定索引哪些列,它究竟做了什么?

edit:should one index colums one is likely to want to ORDER BY?

编辑:应该一个索引列可能想要 ORDER BY 吗?

Thanks a lot,

非常感谢,

Mala

麻辣烫

回答by Click Upvote

Primary key is usually used to create a numerical 'id' for your records, and this id column is automatically incremented.

主键通常用于为您的记录创建一个数字“id”,此 id 列会自动递增。

For example, if you have a bookstable with an idfield, where the idis the primary key and is also set to auto_increment(Under 'Extra in phpmyadmin), then when you first add a book to the table, the id for that will become 1'. The next book's id would automatically be '2', and so on. Normally, every table should have at least one primary key to help identifying and finding records easily.

例如,如果您有一个books带有id字段的表,其中id是主键并且也设置为auto_increment(在 phpmyadmin 中的“Extra”下),那么当您第一次向表中添加一本书时,该表的 id 将变为 1' . 下一本书的 id 将自动为“2”,依此类推。通常,每张表都应该至少有一个主键,以帮助轻松识别和查找记录。

Indexes are used when you need to retrieve certain information from a table regularly. For example, if you have a userstable, and you will need to access the emailcolumn a lot, then you can add an index on email, and this will cause queries accessing the email to be faster.

当您需要定期从表中检索某些信息时,将使用索引。例如,如果您有一个users表,并且需要大量访问该email列,那么您可以在电子邮件上添加索引,这将导致访问电子邮件的查询更快。

However there are also downsides for adding unnecessary indexes, so add this only on the columns that really do need to be accessed more than the others. For example, UPDATE, DELETEand INSERTqueries will be a little slower the more indexes you have, as MySQL needs to store extra information for each indexed column. More info can be found at this page.

然而,添加不必要的索引也有缺点,所以只在真正需要访问的列上添加它,而不是其他列。例如UPDATEDELETE并且INSERT查询会慢一点,你有更多的指标,因为MySQL需要存储每个索引列的额外信息。更多信息可以在这个页面找到。

Edit: Yes, columns that need to be used in ORDER BYa lot should have indexes, as well as those used in WHERE.

编辑:是的,需要ORDER BY大量使用的列应该有索引,以及在WHERE.

回答by Michael Borgwardt

The primary key is basically a unique, indexed column that acts as the "official" ID of rows in that table. Most importantly, it is generally used for foreign key relationships, i.e. if another table refers to a row in the first, it will contain a copy of that row's primary key.

主键基本上是一个唯一的索引列,充当该表中行的“官方”ID。最重要的是,它通常用于外键关系,即如果另一个表引用了第一个行,它将包含该行主键的副本。

Note that it's possible to have a composite primary key, i.e. one that consists of more than one column.

请注意,可能有一个复合主键,即包含多于一列的主键。

Indexes improve lookup times. They're usually tree-based, so that looking up a certain row via an index takes O(log(n)) time rather than scanning through the full table.

索引改进了查找时间。它们通常是基于树的,因此通过索引查找某一行需要 O(log(n)) 时间,而不是扫描整个表。

Generally, any column in a large table that is frequently used in WHERE, ORDER BYor (especially) JOINclauses should have an index. Since the index needs to be updated for evey INSERT, UPDATEor DELETE, it slows down those operations. If you have few writes and lots of reads, then index to your hear's content. If you have both lots of writes and lots of queries that would require indexes on many columns, then you have a big problem.

通常,大表中经常在WHERE,ORDER BY或(尤其是)JOIN子句中使用的任何列都应该有一个索引。由于指数需求埃维进行更新INSERTUPDATE或者DELETE,它会减慢这些操作。如果你写的很少,读的很多,那么索引你听到的内容。如果你有大量的写入和大量的查询,需要在许多列上建立索引,那么你就有了一个大问题。

回答by APC

The difference between a primary key and a unique key is best explained through an example.

最好通过一个例子来解释主键和唯一键之间的区别。

We have a table of users:

我们有一个用户表:

USER_ID number 
NAME varchar(30)
EMAIL varchar(50)

In that table the USER_ID is the primary key. The NAME is not unique - there are a lot of John Smiths and Muhammed Khans in the world. The EMAIL is necessarily unique, otherwise the worldwide email system wouldn't work. So we put a unique constraint on EMAIL.

在该表中,USER_ID 是主键。NAME 并不是唯一的——世界上有很多约翰·史密斯和穆罕默德·汗。EMAIL 必须是唯一的,否则全球电子邮件系统将无法工作。因此,我们对 EMAIL 设置了唯一约束。

Why then do we need a separate primary key? Three reasons:

那么为什么我们需要一个单独的主键呢?三个原因:

  1. the numeric key is more efficient when used in foreign key relationships as it takes less space
  2. the email can change (for example swapping provider) but the user is still the same; rippling a change of a primary key value throughout a schema is always a nightmare
  3. it is always a bad idea to use sensitive or private information as a foreign key
  1. 数字键在外键关系中使用时效率更高,因为它占用的空间更少
  2. 电子邮件可以更改(例如交换提供商),但用户仍然相同;在整个架构中改变主键值总是一场噩梦
  3. 使用敏感或私人信息作为外键总是一个坏主意

回答by Walter Mitty

In the relational model, any column or set of columns that is guaranteed to be both present and unique in the table can be called a candidate key to the table. "Present" means "NOT NULL". It's common practice in database design to designate one of the candidate keys as the primary key, and to use references to the primary key to refer to the entire row, or to the subject matter item that the row describes.

在关系模型中,保证表中存在且唯一的任何列或列集都可以称为表的候选键。“存在”的意思是“非空”。数据库设计中的常见做法是将候选键之一指定为主键,并使用对主键的引用来引用整行或行描述的主题项。

In SQL, a PRIMARY KEY constraint amounts to a NOT NULL constraint for each primary key column, and a UNIQUE constraint for all the primary key columns taken together. In practice many primary keys turn out to be single columns.

在 SQL 中,PRIMARY KEY 约束相当于每个主键列的 NOT NULL 约束,以及所有主键列的 UNIQUE 约束。在实践中,许多主键变成了单列。

For most DBMS products, a PRIMARY KEY constraint will also result in an index being built on the primary key columns automatically. This speeds up the systems checking activity when new entries are made for the primary key, to make sure the new value doesn't duplicate an existing value. It also speeds up lookups based on the primary key value and joins between the primary key and a foreign key that references it. How much speed up occurs depends on how the query optimizer works.

对于大多数 DBMS 产品,PRIMARY KEY 约束还会导致自动在主键列上构建索引。当为主键创建新条目时,这会加快系统检查活动的速度,以确保新值不会与现有值重复。它还可以加快基于主键值的查找以及主键和引用它的外键之间的连接。发生多少加速取决于查询优化器的工作方式。

Originally, relational database designers looked for natural keys in the data as given. In recent years, the tendency has been to always create a column called ID, an integer as the first column and the primary key of every table. The autogenerate feature of the DBMS is used to ensure that this key will be unique. This tendency is documented in the "Oslo design standards". It isn't necessarily relational design, but it serves some immediate needs of the people who follow it. I do not recommend this practice, but I recognize that it is the prevalent practice.

最初,关系数据库设计人员在给定的数据中寻找自然键。近年来的趋势是总是创建一个名为 ID 的列,一个整数作为第一列和每个表的主键。DBMS 的自动生成功能用于确保此密钥是唯一的。这种趋势记录在“奥斯陆设计标准”中。它不一定是关系设计,但它满足了遵循它的人的一些直接需求。我不推荐这种做法,但我承认这是普遍的做法。

An index is a data structure that allows for rapid access to a few rows in a table, based on a description of the columns of the table that are indexed. The index consists of copies of certain table columns, called index keys, interspersed with pointers to the table rows. The pointers are generally hidden from the DBMS users. Indexes work in tandem with the query optimizer. The user specifies in SQL what data is being sought, and the optimizer comes up with index strategies and other strategies for translating what is being sought into a stategy for finding it. There is some kind of organizing principle, such as sorting or hashing, that enables an index to be used for fast lookups, and certain other uses. This is all internal to the DBMS, once the database builder has created the index or declared the primary key.

索引是一种数据结构,它允许根据被索引的表的列的描述快速访问表中的几行。索引由某些表列的副本组成,称为索引键,散布着指向表行的指针。这些指针通常对 DBMS 用户隐藏。索引与查询优化器协同工作。用户在 SQL 中指定要查找的数据,优化器提出索引策略和其他策略,用于将要查找的内容转换为查找它的策略。有某种组织原则,例如排序或散列,可以使索引用于快速查找和某些其他用途。一旦数据库构建器创建了索引或声明了主键,这一切都是 DBMS 内部的。

Indexes can be built that have nothing to do with the primary key. A primary key can exist without an index, although this is generally a very bad idea.

可以构建与主键无关的索引。主键可以在没有索引的情况下存在,尽管这通常是一个非常糟糕的主意。