SQL 哪个更快/最好?SELECT * 或 SELECT column1、colum2、column3 等

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/65512/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:23:38  来源:igfitidea点击:

Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc

sqldatabase

提问by Dan Herbert

I've heard that SELECT *is generally bad practice to use when writing SQL commands because it is more efficient to SELECTcolumns you specifically need.

我听说SELECT *在编写 SQL 命令时使用这通常是不好的做法,因为它对SELECT您特别需要的列更有效。

If I need to SELECTevery column in a table, should I use

如果我需要SELECT表中的每一列,我应该使用

SELECT * FROM TABLE

or

或者

SELECT column1, colum2, column3, etc. FROM TABLE

Does the efficiency really matter in this case? I'd think SELECT *would be more optimal internally if you really need all of the data, but I'm saying this with no real understanding of database.

在这种情况下,效率真的很重要吗?SELECT *如果您真的需要所有数据,我认为内部会更优化,但我是在对数据库没有真正了解的情况下这么说的。

I'm curious to know what the best practice is in this case.

我很想知道在这种情况下最佳实践是什么。

UPDATE:I probably should specify that the only situation where I would really wantto do a SELECT *is when I'm selecting data from one table where I know all columns will always need to be retrieved, even when new columns are added.

更新:我可能应该指定我真正想要做的唯一情况SELECT *是当我从一个表中选择数据时,我知道即使添加了新列,也总是需要检索所有列。

Given the responses I've seen however, this still seems like a bad idea and SELECT *should never be used for a lot more technical reasons that I ever though about.

然而,鉴于我所看到的回应,这似乎仍然是一个坏主意,SELECT *永远不应该用于我曾经想过的更多技术原因。

回答by Jon Galloway

One reason that selecting specific columns is better is that it raises the probability that SQL Server can access the data from indexes rather than querying the table data.

选择特定列更好的原因之一是它提高了 SQL Server 可以从索引访问数据而不是查询表数据的可能性。

Here's a post I wrote about it: The real reason select queries are bad index coverage

这是我写的一篇文章: 选择查询的真正原因是索引覆盖不好

It's also less fragile to change, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.

更改也不那么脆弱,因为无论您将来对表架构进行何种更改,使用数据的任何代码都将获得相同的数据结构。

回答by IDisposable

Given yourspecification that you areselecting all columns, there is little difference at this time. Realize, however, that database schemas do change. If you use SELECT *you are going to get any new columns added to the table, even though in all likelihood, your code is not prepared to use or present that new data. This means that you are exposing your system to unexpected performance and functionality changes.

鉴于你的规范,你选择所有列,几乎没有什么差别 此时。但是,要意识到数据库模式确实会发生变化。如果您使用,SELECT *您将向表中添加任何新列,即使很可能您的代码还没有准备好使用或呈现这些新数据。这意味着您将系统暴露在意外的性能和功能更改中。

You may be willing to dismiss this as a minor cost, but realize that columns that you don't need still must be:

您可能愿意将其视为小成本而不予考虑,但要意识到您不需要的列仍然必须是:

  1. Read from database
  2. Sent across the network
  3. Marshalled into your process
  4. (for ADO-type technologies) Saved in a data-table in-memory
  5. Ignored and discarded / garbage-collected
  1. 从数据库中读取
  2. 通过网络发送
  3. 编组到您的流程中
  4. (对于 ADO 类型的技术)保存在内存中的数据表中
  5. 被忽略和丢弃/垃圾收集

Item #1 has many hidden costs including eliminating some potential covering index, causing data-page loads (and server cache thrashing), incurring row / page / table locks that might be otherwise avoided.

第 1 项有许多隐藏成本,包括消除一些潜在的覆盖索引、导致数据页加载(和服务器缓存抖动)、导致本可以避免的行/页/表锁定。

Balance this against the potential savings of specifying the columns versus an *and the only potential savings are:

将此与指定列与 an 的潜在节省进行平衡*,唯一潜在的节省是:

  1. Programmer doesn't need to revisit the SQL to add columns
  2. The network-transport of the SQL is smaller / faster
  3. SQL Server query parse / validation time
  4. SQL Server query plan cache
  1. 程序员不需要重新访问SQL来添加列
  2. SQL 的网络传输更小/更快
  3. SQL Server 查询解析/验证时间
  4. SQL Server 查询计划缓存

For item 1, the reality is that you're going to add / change code to use any new column you might add anyway, so it is a wash.

对于第 1 项,实际情况是您将添加/更改代码以使用您可能添加的任何新列,因此它是一种清洗。

For item 2, the difference is rarely enough to push you into a different packet-size or number of network packets. If you get to the point where SQL statement transmission time is the predominant issue, you probably need to reduce the rate of statements first.

对于第 2 项,差异很少足以将您推入不同的数据包大小或网络数据包数量。如果您到了 SQL 语句传输时间是主要问题的地步,您可能需要首先降低语句的速度。

For item 3, there is NO savings as the expansion of the *has to happen anyway, which means consulting the table(s) schema anyway. Realistically, listing the columns will incur the same cost because they have to be validated against the schema. In other words this is a complete wash.

对于第 3 项,没有任何节省,因为*无论如何都必须进行扩展,这意味着无论如何都要咨询表模式。实际上,列出列将产生相同的成本,因为它们必须根据模式进行验证。换句话说,这是一次彻底的清洗。

For item 4, when you specify specific columns, your query plan cache could get larger but onlyif you are dealing with different sets of columns (which is not what you've specified). In this case, you do wantdifferent cache entries because you want different plans as needed.

对于第4项,当您指定特定列,查询计划缓存可以得到更大的,但只有当你正在处理不同的列集合(这是不是您所指定的)。在这种情况下,您确实需要不同的缓存条目,因为您需要根据需要使用不同的计划。

So, this all comes down, because of the way you specified the question, to the issue resiliency in the face of eventual schema modifications. If you're burning this schema into ROM (it happens), then an *is perfectly acceptable.

因此,由于您指定问题的方式,这一切都归结为面对最终架构修改时的问题弹性。如果您将此模式刻录到 ROM 中(它发生了),那么 an*是完全可以接受的。

However, my general guideline is that you should only select the columns you need, which means that sometimesit will look like you are asking for all of them, but DBAs and schema evolution mean that some new columns might appear that could greatly affect the query.

但是,我的一般指导原则是您应该只选择您需要的列,这意味着有时看起来您要求所有这些列,但 DBA 和模式演变意味着可能会出现一些可能对查询产生很大影响的新列.

My advice is that you should ALWAYS SELECT specific columns. Remember that you get good at what you do over and over, so just get in the habit of doing it right.

我的建议是您应该始终选择特定的列。请记住,您一遍又一遍地擅长于所做的事情,因此请养成正确做事的习惯。

If you are wondering why a schema might change without code changing, think in terms of audit logging, effective/expiration dates and other similar things that get added by DBAs for systemically for compliance issues. Another source of underhanded changes is denormalizations for performance elsewhere in the system or user-defined fields.

如果您想知道为什么架构可能会在不更改代码的情况下发生变化,请考虑审计日志记录、有效/到期日期以及 DBA 为系统性添加的其他类似内容以解决合规性问题。另一个不合理变化的来源是系统或用户定义字段中其他地方的性能的非规范化。

回答by Giorgi

You should only select the columns that you need. Even if you need all columns it's still better to list column names so that the sql server does not have to query system table for columns.

您应该只选择您需要的列。即使您需要所有列,最好列出列名,以便 sql server 不必查询系统表中的列。

Also, your application might break if someone adds columns to the table. Your program will get columns it didn't expect too and it might not know how to process them.

此外,如果有人向表中添加列,您的应用程序可能会中断。你的程序也会得到它意想不到的列,它可能不知道如何处理它们。

Apart from this if the table has a binary column then the query will be much more slower and use more network resources.

除此之外,如果表有一个二进制列,那么查询会慢得多并使用更多的网络资源。

回答by pkh

There are four big reasons that select *is a bad thing:

select *坏事有四大原因:

  1. The most significant practical reason is that it forces the user to magically know the order in which columns will be returned. It's better to be explicit, which also protects you against the table changing, which segues nicely into...

  2. If a column name you're using changes, it's better to catch it early (at the point of the SQL call) rather than when you're trying to use the column that no longer exists (or has had its name changed, etc.)

  3. Listing the column names makes your code far more self-documented, and so probably more readable.

  4. If you're transferring over a network (or even if you aren't), columns you don't need are just waste.

  1. 最重要的实际原因是它迫使用户神奇地知道返回列的顺序。最好是明确的,这也可以保护您免受表格更改的影响,这很好地转换为......

  2. 如果您使用的列名发生变化,最好尽早(在 SQL 调用时)捕获它,而不是在您尝试使用不再存在的列(或已更改其名称等)时捕获它。 )

  3. 列出列名可以使您的代码更加自我记录,因此可能更具可读性。

  4. 如果您通过网络传输(或者即使不是),您不需要的列只是浪费。

回答by pkh

Specifying the column list is usuallythe best option because your application won't be affected if someone adds/inserts a column to the table.

指定列列表通常是最好的选择,因为如果有人向表中添加/插入列,您的应用程序不会受到影响。

回答by Herb Caudill

Specifying column names is definitely faster - for the server. But if

指定列名肯定更快 - 对于服务器。但是如果

  1. performance is not a big issue(for example, this is a website content database with hundreds, maybe thousands - but not millions - of rows in each table); AND
  2. your job is to create many small, similar applications(e.g. public-facing content-managed websites) using a common framework, rather than creating a complex one-off application; AND
  3. flexibility is important(lots of customization of the db schema for each site);
  1. 性能不是一个大问题(例如,这是一个网站内容数据库,每个表中有数百,也许是数千 - 但不是数百万 - 行);和
  2. 您的工作是使用通用框架创建许多小型的类似应用程序(例如面向公众的内容管理网站),而不是创建复杂的一次性应用程序;和
  3. 灵活性很重要(为每个站点定制了大量的数据库架构);

then you're better off sticking with SELECT *. In our framework, heavy use of SELECT * allows us to introduce a new website managed content field to a table, giving it all of the benefits of the CMS (versioning, workflow/approvals, etc.), while only touching the code at a couple of points, instead of a couple dozen points.

那么你最好坚持使用 SELECT *。在我们的框架中,大量使用 SELECT * 允许我们向表中引入一个新的网站管理内容字段,赋予它 CMS 的所有好处(版本控制、工作流/批准等),同时只需要一次接触代码。几个点,而不是几十个点。

I know the DB gurus are going to hate me for this - go ahead, vote me down - but in my world, developer time is scarce and CPU cycles are abundant, so I adjust accordingly what I conserve and what I waste.

我知道数据库专家会因此而讨厌我 - 继续,投票给我 - 但在我的世界中,开发人员时间稀缺而 CPU 周期充足,所以我相应地调整我节省的和浪费的。

回答by VladV

SELECT * is a bad practice even if the query is not sent over a network.

即使查询不是通过网络发送的,SELECT * 也是一种不好的做法。

  1. Selecting more data than you need makes the query less efficient - the server has to read and transfer extra data, so it takes time and creates unnecessary load on the system (not only the network, as others mentioned, but also disk, CPU etc.). Additionally, the server is unable to optimize the query as well as it might (for example, use covering index for the query).
  2. After some time your table structure might change, so SELECT * will return a different set of columns. So, your application might get a dataset of unexpected structure and break somewhere downstream. Explicitly stating the columns guarantees that you either get a dataset of known structure, or get a clear error on the database level (like 'column not found').
  1. 选择比您需要的更多的数据会使查询效率降低 - 服务器必须读取和传输额外的数据,因此需要时间并在系统上造成不必要的负载(不仅是网络,正如其他人提到的,还有磁盘、CPU 等)。 )。此外,服务器无法优化查询(例如,为查询使用覆盖索引)。
  2. 一段时间后,您的表结构可能会发生变化,因此 SELECT * 将返回一组不同的列。因此,您的应用程序可能会获得一个意外结构的数据集并在下游某处中断。明确说明列可确保您获得已知结构的数据集,或在数据库级别获得明显错误(如“未找到列”)。

Of course, all this doesn't matter much for a small and simple system.

当然,对于一个小而简单的系统来说,这一切都无关紧要。

回答by Chris Wuestefeld

Lots of good reasons answered here so far, here's another one that hasn't been mentioned.

到目前为止,这里已经回答了很多很好的理由,这是另一个没有提到的理由。

Explicitly naming the columns will help you with maintenance down the road. At some point you're going to be making changes or troubleshooting, and find yourself asking "where the heck is that column used".

明确命名列将帮助您进行后续维护。在某些时候,您将进行更改或故障排除,并发现自己在问“该列到底用在哪里”。

If you've got the names listed explicitly, then finding every reference to that column -- through all your stored procedures, views, etc -- is simple. Just dump a CREATE script for your DB schema, and text search through it.

如果您已明确列出名称,那么通过所有存储过程、视图等查找对该列的每个引用就很简单了。只需为您的数据库模式转储一个 CREATE 脚本,然后通过它进行文本搜索。

回答by Yann Ramin

Performance wise, SELECT with specific columns can be faster (no need to read in all the data). If your query really does use ALL the columns, SELECT with explicit parameters is still preferred. Any speed difference will be basically unnoticeable and near constant-time. One day your schema will change, and this is good insurance to prevent problems due to this.

性能方面,带有特定列的 SELECT 可以更快(无需读入所有数据)。如果您的查询确实使用了所有列,则仍然首选带有显式参数的 SELECT。任何速度差异基本上都不会被注意到并且接近恒定时间。有一天你的模式会改变,这是防止由此引起的问题的好保险。

回答by Nick Berardi

definitely defining the columns, because SQL Server will not have to do a lookup on the columns to pull them. If you define the columns, then SQL can skip that step.

明确定义列,因为 SQL Server 将不必对列进行查找来提取它们。如果您定义了列,那么 SQL 可以跳过该步骤。