SQL 为什么 SELECT * 被认为是有害的?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3639861/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is SELECT * considered harmful?
提问by Theodore R. Smith
Why is SELECT *
bad practice? Wouldn't it mean less code to change if you added a new column you wanted?
为什么是SELECT *
不好的做法?如果您添加了您想要的新列,是否意味着需要更改的代码更少?
I understand that SELECT COUNT(*)
is a performance problem on some DBs, but what if you really wanted every column?
我知道这SELECT COUNT(*)
是某些数据库的性能问题,但是如果您真的想要每一列怎么办?
回答by Dave Markle
There are really three major reasons:
其实主要有以下三个原因:
Inefficiency in moving data to the consumer.When you SELECT *, you're often retrieving more columns from the database than your application really needs to function. This causes more data to move from the database server to the client, slowing access and increasing load on your machines, as well as taking more time to travel across the network. This is especially true when someone adds new columns to underlying tables that didn't exist and weren't needed when the original consumers coded their data access.
Indexing issues.Consider a scenario where you want to tune a query to a high level of performance. If you were to use *, and it returned more columns than you actually needed, the server would often have to perform more expensive methods to retrieve your data than it otherwise might. For example, you wouldn't be able to create an index which simply covered the columns in your SELECT list, and even if you did (including all columns [shudder]), the next guy who came around and added a column to the underlying table would cause the optimizer to ignore your optimized covering index, and you'd likely find that the performance of your query would drop substantially for no readily apparent reason.
Binding Problems.When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can often crash your data consumer. Imagine a query that joins two tables, both of which contain a column called "ID". How would a consumer know which was which? SELECT * can also confuse views (at least in some versions SQL Server) when underlying table structures change -- the view is not rebuilt, and the data which comes back can be nonsense. And the worst part of it is that you can take care to name your columns whatever you want, but the next guy who comes along might have no way of knowing that he has to worry about adding a column which will collide with your already-developed names.
将数据传输给消费者的效率低下。当您选择 * 时,您通常从数据库中检索的列比您的应用程序实际运行所需的列多。这会导致更多数据从数据库服务器移动到客户端,从而减慢访问速度并增加计算机的负载,并花费更多时间通过网络传输。当有人将新列添加到不存在且在原始使用者对其数据访问进行编码时不需要的基础表中时,尤其如此。
索引问题。考虑一种情况,您希望将查询调优到高水平的性能。如果您要使用 *,并且它返回的列比您实际需要的多,则服务器通常必须执行比其他方式更昂贵的方法来检索您的数据。例如,您将无法创建一个仅覆盖 SELECT 列表中的列的索引,即使您这样做了(包括所有列 [ shudder]),下一个过来并向底层添加列的人table 会导致优化器忽略您优化的覆盖索引,并且您可能会发现查询的性能会在没有明显原因的情况下大幅下降。
绑定问题。SELECT * 时,可以从两个不同的表中检索同名的两列。这通常会使您的数据使用者崩溃。想象一个连接两个表的查询,这两个表都包含一个名为“ID”的列。消费者如何知道哪个是哪个?当底层表结构发生变化时,SELECT * 也可能混淆视图(至少在某些版本的 SQL Server 中)——视图没有重建,返回的数据可能是无意义的。最糟糕的是,您可以随意命名您的列,但是下一个出现的人可能无法知道他必须担心添加会与您已经开发的列发生冲突名称。
But it's not all bad for SELECT *. I use it liberally for these use cases:
但这对 SELECT * 来说并不全是坏事。我在这些用例中大量使用它:
Ad-hoc queries.When trying to debug something, especially off a narrow table I might not be familiar with, SELECT * is often my best friend. It helps me just see what's going on without having to do a boatload of research as to what the underlying column names are. This gets to be a bigger "plus" the longer the column names get.
When * means "a row".In the following use cases, SELECT * is just fine, and rumors that it's a performance killer are just urban legends which may have had some validity many years ago, but don't now:
SELECT COUNT(*) FROM table;
in this case, * means "count the rows". If you were to use a column name instead of * , it would count the rows where that column's value was not null. COUNT(*), to me, really drives home the concept that you're counting rows, and you avoid strange edge-cases caused by NULLs being eliminated from your aggregates.
Same goes with this type of query:
SELECT a.ID FROM TableA a WHERE EXISTS ( SELECT * FROM TableB b WHERE b.ID = a.B_ID);
in any database worth its salt, * just means "a row". It doesn't matter what you put in the subquery. Some people use b's ID in the SELECT list, or they'll use the number 1, but IMO those conventions are pretty much nonsensical. What you mean is "count the row", and that's what * signifies. Most query optimizers out there are smart enough to know this. (Though to be honest, I only knowthis to be true with SQL Server and Oracle.)
即席查询。在尝试调试某些东西时,尤其是在我可能不熟悉的窄表上调试时,SELECT * 通常是我最好的朋友。它可以帮助我了解正在发生的事情,而无需对底层列名称进行大量研究。列名越长,这将成为一个更大的“加号”。
当 * 表示“一行”时。在以下用例中,SELECT * 很好,关于它是性能杀手的传言只是城市传说,多年前可能有一定的有效性,但现在不是:
SELECT COUNT(*) FROM table;
在这种情况下,* 表示“计算行数”。如果您要使用列名而不是 * ,它将计算该列的值不为 null 的行。COUNT(*),对我来说,真的让你明白了你正在计算行的概念,并且你避免了由从聚合中消除 NULL 引起的奇怪的边缘情况。
此类查询也是如此:
SELECT a.ID FROM TableA a WHERE EXISTS ( SELECT * FROM TableB b WHERE b.ID = a.B_ID);
在任何值得称道的数据库中,* 仅表示“一行”。您在子查询中放入什么并不重要。有些人在 SELECT 列表中使用 b 的 ID,或者他们会使用数字 1,但 IMO 这些约定几乎是荒谬的。您的意思是“计算行数”,这就是 * 的含义。大多数查询优化器都足够聪明,知道这一点。(虽然说实话,我只知道SQL Server 和 Oracle 是这样。)
回答by OMG Ponies
The asterisk character, "*", in the SELECT statement is shorthand for all the columns in the table(s) involved in the query.
SELECT 语句中的星号字符“*”是查询中涉及的表中所有列的简写。
Performance
表现
The *
shorthand can be slower because:
该*
速记速度可能很慢,因为:
- Not all the fields are indexed, forcing a full table scan - less efficient
- What you save to send
SELECT *
over the wire risks a full table scan - Returning more data than is needed
- Returning trailing columns using variable length data type can result in search overhead
- 并非所有字段都被索引,强制进行全表扫描 - 效率较低
- 您保存的通过网络发送
SELECT *
的内容可能会导致全表扫描 - 返回比需要更多的数据
- 使用可变长度数据类型返回尾随列会导致搜索开销
Maintenance
维护
When using SELECT *
:
使用时SELECT *
:
- Someone unfamiliar with the codebase would be forced to consult documentation to know what columns are being returned before being able to make competent changes. Making code more readable, minimizing the ambiguity and work necessary for people unfamiliar with the code saves more time and effort in the long run.
- If code depends on column order,
SELECT *
will hide an error waiting to happen if a table had its column order changed. - Even if you need every column at the time the query is written, that might not be the case in the future
- the usage complicates profiling
- 不熟悉代码库的人将被迫查阅文档以了解正在返回的列,然后才能进行有效的更改。使代码更具可读性,最大限度地减少不熟悉代码的人所需的歧义和工作,从长远来看可以节省更多的时间和精力。
- 如果代码取决于列顺序,
SELECT *
则将隐藏等待发生的错误,如果表的列顺序发生更改。 - 即使您在编写查询时需要每一列,将来也可能不是这种情况
- 使用使分析复杂化
Design
设计
SELECT *
is an anti-pattern:
SELECT *
是一个反模式:
- The purpose of the query is less obvious; the columns used by the application is opaque
- It breaks the modularity rule about using strict typing whenever possible. Explicit is almost universally better.
- 查询的目的不太明显;应用程序使用的列是不透明的
- 它打破了尽可能使用严格类型的模块化规则。显式几乎普遍更好。
When Should "SELECT *" Be Used?
什么时候应该使用“SELECT *”?
It's acceptable to use SELECT *
when there's the explicit need for every column in the table(s) involved, as opposed to every column that existed when the query was written. The database will internally expand the * into the complete list of columns - there's no performance difference.
SELECT *
当明确需要所涉及的表中的每一列时使用它是可以接受的,而不是在编写查询时存在的每一列。数据库将在内部将 * 扩展为完整的列列表 - 没有性能差异。
Otherwise, explicitly list every column that is to be used in the query - preferably while using a table alias.
否则,明确列出要在查询中使用的每一列 - 最好在使用表别名时。
回答by Mark Byers
Even if you wanted to select every column now, you might not want to select every column after someone adds one or more new columns. If you write the query with SELECT *
you are taking the risk that at some point someone might add a column of text which makes your query run more slowly even though you don't actually need that column.
即使您现在想选择每一列,您也可能不想在有人添加一个或多个新列后选择每一列。如果您在编写查询时SELECT *
冒着风险,在某些时候有人可能会添加一列文本,这会使您的查询运行得更慢,即使您实际上并不需要该列。
Wouldn't it mean less code to change if you added a new column you wanted?
如果您添加了您想要的新列,是否意味着需要更改的代码更少?
The chances are that if you actually want to use the new column then you will have to make quite a lot other changes to your code anyway. You're only saving , new_column
- just a few characters of typing.
很有可能,如果您真的想使用新列,那么无论如何您都必须对代码进行很多其他更改。您只是在节省, new_column
- 只需输入几个字符。
回答by supercat
If you name the columns in a SELECT statement, they will be returned in the order specified, and may thus safely be referenced by numerical index. If you use "SELECT *", you may end up receiving the columns in arbitrary sequence, and thus can only safely use the columns by name. Unless you know in advance what you'll be wanting to do with any new column that gets added to the database, the most probable correct action is to ignore it. If you're going to be ignoring any new columns that get added to the database, there is no benefit whatsoever to retrieving them.
如果您在 SELECT 语句中命名列,它们将按指定的顺序返回,因此可以安全地由数字索引引用。如果您使用“SELECT *”,您最终可能会以任意顺序接收列,因此只能安全地按名称使用列。除非您事先知道要对添加到数据库的任何新列做什么,否则最可能的正确操作是忽略它。如果您要忽略添加到数据库中的任何新列,检索它们没有任何好处。
回答by Kelly S. French
Think of it as reducing the coupling between the app and the database.
将其视为减少应用程序和数据库之间的耦合。
To summarize the 'code smell' aspect:SELECT *
creates a dynamic dependency between the app and the schema. Restricting its use is one way of making the dependency more defined, otherwise a change to the database has a greater likelihood of crashing your application.
总结“代码味道”方面:SELECT *
在应用程序和架构之间创建动态依赖关系。限制其使用是使依赖关系更加明确的一种方法,否则对数据库的更改更有可能导致应用程序崩溃。
回答by Guffa
If you add fields to the table, they will automatically be included in all your queries where you use select *
. This may seem convenient, but it will make your application slower as you are fetching more data than you need, and it will actually crash your application at some point.
如果您向表中添加字段,它们将自动包含在您使用 的所有查询中select *
。这看起来很方便,但它会使您的应用程序变慢,因为您获取的数据超出了您的需要,并且实际上会在某些时候使您的应用程序崩溃。
There is a limit for how much data you can fetch in each row of a result. If you add fields to your tables so that a result ends up being over that limit, you get an error message when you try to run the query.
您可以在结果的每一行中获取多少数据是有限制的。如果将字段添加到表中,结果最终超过该限制,则在尝试运行查询时会收到错误消息。
This is the kind of errors that are hard to find. You make a change in one place, and it blows up in some other place that doesn't actually use the new data at all. It may even be a less frequently used query so that it takes a while before someone uses it, which makes it even harder to connect the error to the change.
这是一种很难发现的错误。您在一个地方进行了更改,但它在其他一些根本不使用新数据的地方发生了变化。它甚至可能是一个不太常用的查询,因此在有人使用它之前需要一段时间,这使得将错误与更改联系起来更加困难。
If you specify which fields you want in the result, you are safe from this kind of overhead overflow.
如果您在结果中指定您想要的字段,您就可以避免这种开销溢出。
回答by Andrew Lewis
In a lot of situations, SELECT * will cause errors at run time in your application, rather than at design time. It hides the knowledge of column changes, or bad references in your applications.
在很多情况下,SELECT * 会在应用程序的运行时导致错误,而不是在设计时。它隐藏了应用程序中列更改或错误引用的知识。
回答by brabster
If you really want every column, I haven't seen a performance difference between select (*) and naming the columns. The driver to name the columns might be simply to be explicit about what columns you expect to see in your code.
如果你真的想要每一列,我还没有看到选择 (*) 和命名列之间的性能差异。命名列的驱动程序可能只是明确说明您希望在代码中看到哪些列。
Often though, you don't want every column and the select(*) can result in unnecessary work for the database server and unnecessary information having to be passed over the network. It's unlikely to cause a noticeable problem unless the system is heavily utilised or the network connectivity is slow.
但是,通常情况下,您并不需要每一列,而 select(*) 可能会导致数据库服务器进行不必要的工作,并且必须通过网络传递不必要的信息。除非系统被大量使用或网络连接缓慢,否则不太可能导致明显问题。
回答by Anvesh
Reference taken from this article.
Never go with "SELECT *",
永远不要使用“SELECT *”,
I have found only one reason to use "SELECT *"
我发现使用“SELECT *”的原因只有一个
If you have special requirements and created dynamic environment when add or delete column automatically handle by application code. In this special case you don't require to change application and database code and this will automatically affect on production environment. In this case you can use “SELECT *”.
如果您有特殊要求并在添加或删除列时创建了动态环境,则由应用程序代码自动处理。在这种特殊情况下,您不需要更改应用程序和数据库代码,这将自动影响生产环境。在这种情况下,您可以使用“SELECT *”。
回答by Aradhana Mohanty
Selecting with column name raises the probability that database engine can access the data from indexes rather than querying the table data.
使用列名进行选择提高了数据库引擎可以从索引访问数据而不是查询表数据的可能性。
SELECT * exposes your system to unexpected performance and functionality changes in the case when your database schema changes because you are going to get any new columns added to the table, even though, your code is not prepared to use or present that new data.
SELECT * 将您的系统暴露在意外的性能和功能更改的情况下,因为您将要向表中添加任何新列,即使您的代码没有准备好使用或呈现该新数据,但您的数据库架构发生更改时也是如此。