SQL 不使用select *的原因是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/321299/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 00:22:46  来源:igfitidea点击:

What is the reason not to use select *?

sql

提问by NotMe

I've seen a number of people claim that you should specifically name each column you want in your select query.

我看到很多人声称您应该在选择查询中明确命名您想要的每一列。

Assuming I'm going to use all of the columns anyway, why would I not use SELECT *?

假设我无论如何都要使用所有列,为什么我不使用SELECT *

Even considering the question *SQL query - Select * from view or Select col1, col2, … colN from view*, I don't think this is an exact duplicate as I'm approaching the issue from a slightly different perspective.

即使考虑问题 * SQL 查询 - Select * from view 或 Select col1, col2, ... colN from view*,我也不认为这是完全重复的,因为我从稍微不同的角度处理这个问题。

One of our principles is to not optimize before it's time. With that in mind, it seems like using SELECT *should be the preferredmethod until it is proven to be a resource issue or the schema is pretty much set in stone. Which, as we know, won't occur until development is completely done.

我们的原则之一是在时机成熟之前不进行优化。考虑到这一点,似乎使用SELECT *应该是首选方法,直到它被证明是资源问题或模式几乎一成不变。正如我们所知,在开发完全完成之前不会发生。

That said, is there an overriding issue to not use SELECT *?

也就是说,是否存在不使用的压倒一切的问题SELECT *

回答by Robert Paulson

The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and thenuse a profiler to point out the hot spots, which you can then optimize to be efficient.

不要过早优化的名言的本质是选择简单直接的代码,然后使用分析器指出热点,然后您可以优化以提高效率。

When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select *is an anti-pattern.

当您使用 select * 时,您将无法进行分析,因此您没有编写清晰直接的代码,并且违背了引用的精神。select *是一种反模式。



So selecting columns is not a premature optimization. A few things off the top of my head ....

所以选择列不是过早的优化。我头顶上有几件事......

  1. If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
  2. You can more easily scan code where that column is being used.
  3. You should always write queries to bring back the least amount of information.
  4. As others mention if you use ordinal column access you should never use select *
  5. If your SQL statement joins tables, select * gives you all columns from all tables in the join
  1. 如果在 SQL 语句中指定列,如果从表中删除该列并执行查询,则 SQL 执行引擎将出错。
  2. 您可以更轻松地扫描使用该列的代码。
  3. 您应该始终编写查询以带回最少的信息。
  4. 正如其他人提到的,如果您使用序数列访问,则永远不应使用 select *
  5. 如果您的 SQL 语句连接表,则 select * 为您提供连接中所有表的所有列

The corollary is that using select *...

推论是使用select *...

  1. The columns used by the application is opaque
  2. DBA's and their query profilers are unable to help your application's poor performance
  3. The code is more brittle when changes occur
  4. Your database and network are suffering because they are bringing back too much data (I/O)
  5. Database engine optimizations are minimal as you're bringing back all data regardless (logical).
  1. 应用程序使用的列是不透明的
  2. DBA 及其查询分析器无法帮助您的应用程序降低性能
  3. 发生变化时代码更脆弱
  4. 您的数据库和网络正在遭受损失,因为它们带回了太多数据 (I/O)
  5. 数据库引擎优化很小,因为您无论如何(逻辑)都带回所有数据。


Writing correct SQL is just as easy as writing Select *. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.

编写正确的 SQL 就像编写Select *. 所以真正的懒人会写正确的 SQL,因为他们不想重新访问代码并试图记住他们在做的时候做了什么。他们不想向 DBA 解释每一段代码。他们不想向客户解释为什么应用程序像狗一样运行。

回答by Bob

If your code depends on the columns being in a specific order, your code will break when there are changes to the table. Also, you may be fetching too much from the table when you select *, especially if there is a binary field in the table.

如果您的代码取决于按特定顺序排列的列,则当表发生更改时,您的代码将中断。此外,当您选择 * 时,您可能会从表中获取太多内容,尤其是当表中有二进制字段时。

Just because you are using all the columns now, it doesn't mean someone else isn't going to add an extra column to the table.

仅仅因为您现在正在使用所有列,并不意味着其他人不会向表中添加额外的列。

It also adds overhead to the plan execution caching since it has to fetch the meta data about the table to know what columns are in *.

它还增加了计划执行缓存的开销,因为它必须获取有关表的元数据才能知道 *.

回答by ahockley

One major reason is that if you ever add/remove columns from your table, any query/procedure that is making a SELECT * call will now be getting more or less columns of data than expected.

一个主要原因是,如果您从表中添加/删除列,任何进行 SELECT * 调用的查询/过程现在将获得比预期更多或更少的数据列。

回答by JohnFx

  1. In a roundabout way you are breaking the modularity rule about using strict typing wherever possible. Explicit is almost universally better.

  2. Even if you now need every column in the table, more could be added later which will be pulled down every time you run the query and could hurt performance. It hurts performance because

    • You are pulling more data over the wire; and
    • Because you might defeat the optimizer's ability to pull the data right out of the index (for queries on columns that are all part of an index.) rather than doing a lookup in the table itself
  1. 以一种迂回的方式,您正在打破关于尽可能使用严​​格类型的模块化规则。显式几乎普遍更好。

  2. 即使您现在需要表中的每一列,以后也可以添加更多列,每次运行查询时这些列都会被拉下,并可能影响性能。它会损害性能,因为

    • 您正在通过网络获取更多数据;和
    • 因为您可能会破坏优化器直接从索引中提取数据的能力(对于作为索引一部分的列的查询。)而不是在表本身中进行查找

When TO use select *

何时使用 select *

When you explicitly NEED every column in the table, as opposed to needing every column in the table THAT EXISTED AT THE TIME YOU WROTE THE QUERY. For example, if were writing an DB management app that needed to display the entire contents of the table (whatever they happened to be) you might use that approach.

当您明确需要表中的每一列,而不是需要在您编写查询时存在的表中的每一列时。例如,如果正在编写一个需要显示表的全部内容(无论它们是什么内容)的数据库管理应用程序,您可能会使用这种方法。

回答by Powerlord

There are a few reasons:

有几个原因:

  1. If the number of columns in a database changes and your application expects there to be a certain number...
  2. If the order of columns in a database changes and your application expects them to be in a certain order...
  3. Memory overhead. 8 unnecessary INTEGER columns would add 32 bytes of wasted memory. That doesn't sound like a lot, but this is for each query and INTEGER is one of the small column types... the extra columns are more likely to be VARCHAR or TEXT columns, which adds up quicker.
  4. Network overhead. Related to memory overhead: if I issue 30,000 queries and have 8 unnecessary INTEGER columns, I've wasted 960kB of bandwidth. VARCHAR and TEXT columns are likely to be considerably larger.
  1. 如果数据库中的列数发生变化并且您的应用程序期望有一定数量...
  2. 如果数据库中列的顺序发生变化并且您的应用程序希望它们按特定顺序...
  3. 内存开销。8 个不必要的 INTEGER 列会增加 32 字节的内存浪费。这听起来并不多,但这是针对每个查询的,而 INTEGER 是小列类型之一……额外的列更有可能是 VARCHAR 或 TEXT 列,它们加起来更快。
  4. 网络开销。与内存开销相关:如果我发出 30,000 个查询并有 8 个不必要的 INTEGER 列,我浪费了 960kB 的带宽。VARCHAR 和 TEXT 列可能要大得多。

Note: I chose INTEGER in the above example because they have a fixed size of 4 bytes.

注意:我在上面的例子中选择了 INTEGER,因为它们有 4 个字节的固定大小。

回答by MusiGenesis

If your application gets data with SELECT * and the table structure in the database is changed (say a column is removed), your application will fail in every place that you reference the missing field. If you instead include all the columns in your query, you application will break in the (hopefully) one place where you initially get the data, making the fix easier.

如果您的应用程序使用 SELECT * 获取数据并且更改了数据库中的表结构(假设删除了一列),您的应用程序将在您引用缺失字段的每个地方失败。如果您改为在查询中包含所有列,您的应用程序将在(希望)您最初获取数据的一个地方中断,从而使修复更容易。

That being said, there are a number of situations in which SELECT * is desirable. One is a situation that I encounter all the time, where I need to replicate an entire table into another database (like SQL Server to DB2, for example). Another is an application written to display tables generically (i.e. without any knowledge of any particular table).

话虽如此,在许多情况下 SELECT * 是可取的。一种是我一直遇到的情况,我需要将整个表复制到另一个数据库中(例如 SQL Server 到 DB2)。另一个是为一般地显示表格而编写的应用程序(即,不了解任何特定表格)。

回答by kristof

I actually noticed a strange behaviour when I used select *in views in SQL Server 2005.

当我select *在 SQL Server 2005 的视图中使用时,我实际上注意到了一个奇怪的行为。

Run the following query and you will see what I mean.

运行以下查询,您将明白我的意思。

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [A] [varchar](50) NULL,
    [B] [varchar](50) NULL,
    [C] [varchar](50) NULL
) ON [PRIMARY]

GO

insert into dbo.starTest
select 'a1','b1','c1'
union all select 'a2','b2','c2'
union all select 'a3','b3','c3'

go
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vStartest]'))
DROP VIEW [dbo].[vStartest]
go
create view dbo.vStartest as
select * from dbo.starTest
go

go
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vExplicittest]'))
DROP VIEW [dbo].[vExplicittest]
go
create view dbo.[vExplicittest] as
select a,b,c from dbo.starTest
go


select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicitTest

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [A] [varchar](50) NULL,
    [B] [varchar](50) NULL,
    [D] [varchar](50) NULL,
    [C] [varchar](50) NULL
) ON [PRIMARY]

GO

insert into dbo.starTest
select 'a1','b1','d1','c1'
union all select 'a2','b2','d2','c2'
union all select 'a3','b3','d3','c3'

select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicittest

Compare the results of last 2 select statements. I believe what you will see is a result of Select *referencing columns by index instead of name.

比较最后 2 个 select 语句的结果。我相信您将看到的是Select *通过索引而不是名称引用列的结果。

If you rebuild the view it will work fine again.

如果您重建视图,它将再次正常工作。

EDIT

编辑

I have added a separate question, *“select * from table” vs “select colA, colB, etc. from table” interesting behaviour in SQL Server 2005* to look into that behaviour in more details.

我添加了一个单独的问题,* “从表中选择 *”与“从表中选择 colA、colB 等”在 SQL Server 2005 中的有趣行为* 以更详细地研究该行为。

回答by orbfish

When you're specifying columns, you're also tying yourself into a specific set of columns and making yourself less flexible, making Feuerstein roll over in, well, whereever he is. Just a thought.

当您指定列时,您也将自己绑定到一组特定的列中并降低自己的灵活性,从而使 Feuerstein 滚入,好吧,无论他在哪里。只是一个想法。

回答by Kaniu

You might join two tables and use column A from the second table. If you later add column A to the first table (with same name but possibly different meaning) you'll most likely get the values from the first table and not the second one as earlier. That won't happen if you explicitly specify the columns you want to select.

您可以连接两个表并使用第二个表中的 A 列。如果您稍后将 A 列添加到第一个表(名称相同但含义可能不同),您很可能会从第一个表中获取值,而不是像之前那样从第二个表中获取值。如果您明确指定要选择的列,则不会发生这种情况。

Of course specifying the columns also sometimes causes bugs if you forget to add the new columns to every select clause. If the new column is not needed every time the query is executed, it may take some time before the bug gets noticed.

当然,如果您忘记将新列添加到每个 select 子句中,指定列有时也会导致错误。如果每次执行查询时都不需要新列,则可能需要一些时间才能注意到错误。

回答by Jim B-G

I understand where you're going regarding premature optimization, but that really only goes to a point. The intent is to avoid unnecessaryoptimization in the beginning. Are your tables unindexed? Would you use nvarchar(4000) to store a zip code?

我了解您对过早优化的看法,但这实际上只是一个要点。目的是在开始时避免不必要的优化。您的表是否未编入索引?您会使用 nvarchar(4000) 来存储邮政编码吗?

As others have pointed out, there are other positives to specifying each column you intend to use in the query (such as maintainability).

正如其他人指出的那样,指定您打算在查询中使用的每一列还有其他好处(例如可维护性)。