如何将一列添加到大型 sql server 表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1645215/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I add a column to large sql server table
提问by Jonathan Beerhalter
I have a SQL Server table in production that has millions of rows, and it turns out that I need to add a column to it. Or, to be more accurate, I need to add a field to the entity that the table represents.
我在生产中有一个包含数百万行的 SQL Server 表,结果我需要向其中添加一列。或者,更准确地说,我需要向表所代表的实体添加一个字段。
Syntactically this isn't a problem, and if the table didn't have so many rows and wasn't in production, this would be easy.
从语法上讲,这不是问题,如果表没有这么多行并且不在生产中,这将很容易。
Really what I'm after is the course of action. There are plenty of websites out there with extremely large tables, and they must add fields from time to time. How do they do it without substantial downtime?
我真正追求的是行动方针。有很多网站都有非常大的表格,他们必须不时添加字段。他们如何在没有大量停机的情况下做到这一点?
One thing I should add, I did not want the column to allow nulls, which would mean that I'd need to have a default value.
我应该补充的一件事是,我不希望该列允许空值,这意味着我需要有一个默认值。
So I either need to figure out how to add a column with a default value in a timely manner, or I need to figure out a way to update the column at a later time and then set the column to not allow nulls.
所以我要么需要弄清楚如何及时添加具有默认值的列,要么我需要想办法在以后更新该列,然后将该列设置为不允许空值。
回答by Nestor
ALTER TABLE table1 ADD
newcolumn int NULL
GO
should not take that long... What takes a long time is to insert columns in the middle of other columns... b/c then the engine needs to create a new table and copy the data to the new table.
不应该花那么长时间......花费很长时间的是在其他列的中间插入列...... b / c 然后引擎需要创建一个新表并将数据复制到新表中。
回答by Bill Karwin
The only real solution for continuous uptime is redundancy.
持续正常运行时间的唯一真正解决方案是冗余。
I acknowledge @Nestor's answer that adding a new column shouldn't take long in SQL Server, but nevertheless, it could still be an outage that is not acceptable on a production system. An alternative is to make the change in a parallel system, and then once the operation is complete, swap the new for the old.
我承认@Nestor 的回答是,在 SQL Server 中添加新列应该不会花很长时间,但是,它仍然可能是生产系统上不可接受的中断。另一种方法是在并行系统中进行更改,然后在操作完成后,将新系统换成旧系统。
For example, if you need to add a column, you may create a copy of the table, then add the column to that copy, and then use sp_rename()
to move the old table aside and the new table into place.
例如,如果您需要添加一列,您可以创建该表的一个副本,然后将该列添加到该副本中,然后使用sp_rename()
将旧表移到一边并将新表移到位。
If you have referential integrity constraints pointing to this table, this can make the swap even more tricky. You probably have to drop the constraints briefly as you swap the tables.
如果您有指向此表的参照完整性约束,这会使交换变得更加棘手。在交换表时,您可能必须暂时删除约束。
For some kinds of complex upgrades, you could completely duplicate the database on a separate server host. Once that's ready, just swap the DNS entries for the two servers and voilà!
对于某些类型的复杂升级,您可以在单独的服务器主机上完全复制数据库。一旦准备就绪,只需交换两台服务器的 DNS 条目,瞧!
I supported a stock exchange company in the 1990's who ran three duplicate database servers at all times. That way they could implement upgrades on one server, while retaining one production server and one failover server. Their operations had a standard procedure of rotating the three machines through production, failover, and maintenance roles every day. When they needed to upgrade hardware, software, or alter the database schema, it took three days to propagate the change through their servers, but they could do it with no interruption in service. All thanks to redundancy.
我在 1990 年代支持一家证券交易所,该公司一直运行着三台重复的数据库服务器。这样他们就可以在一台服务器上实施升级,同时保留一台生产服务器和一台故障转移服务器。他们的操作有一个标准程序,每天轮换三台机器,完成生产、故障转移和维护角色。当他们需要升级硬件、软件或更改数据库架构时,需要三天时间才能通过他们的服务器传播更改,但他们可以在不中断服务的情况下进行。这一切都归功于冗余。
回答by Solomon Rutzky
I did not want the column to allow nulls, which would mean that I'd need to have a default value.
我不希望该列允许空值,这意味着我需要有一个默认值。
Adding a NOT NULL
column with a DEFAULT
Constraint to a table of any number of rows (even billions) became a loteasier starting in SQL Server 2012 (but only for Enterprise Edition) as they allowed it to be an Online operation (in most cases) where, for existing rows, the value will be read from meta-data and not actually stored in the row until the row is updated, or clustered index is rebuilt. Rather than paraphrase any more, here is the relevant section from the MSDN page for ALTER TABLE:
添加NOT NULL
了列DEFAULT
约束到任意数量的行(甚至数十亿)的表成为了很多在SQL Server 2012中更容易开始(但仅限于企业版),因为它们允许它是一个在线操作(在大多数情况下),其中,对于现有行,该值将从元数据中读取,并且在更新行或重建聚集索引之前不会实际存储在行中。而不是再解释,这里是来自 MSDN 页面的ALTER TABLE的相关部分:
Adding NOT NULL Columns as an Online Operation
Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column with a default value is an online operation when the default value is a runtime constant. This means that the operation is completed almost instantaneously regardless of the number of rows in the table. This is because the existing rows in the table are not updated during the operation; instead, the default value is stored only in the metadata of the table and the value is looked up as needed in queries that access these rows. This behavior is automatic; no additional syntax is required to implement the online operation beyond the ADD COLUMN syntax. A runtime constant is an expression that produces the same value at runtime for each row in the table regardless of its determinism. For example, the constant expression "My temporary data", or the system function GETUTCDATETIME() are runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID() are not runtime constants because a unique value is produced for each row in the table. Adding a NOT NULL column with a default value that is not a runtime constant is always performed offline and an exclusive (SCH-M) lock is acquired for the duration of the operation.
While the existing rows reference the value stored in metadata, the default value is stored on the row for any new rows that are inserted and do not specify another value for the column. The default value stored in metadata is moved to an existing row when the row is updated (even if the actual column is not specified in the UPDATE statement), or if the table or clustered index is rebuilt.
Columns of type varchar(max), nvarchar(max), varbinary(max), xml, text, ntext, image, hierarchyid, geometry, geography, or CLR UDTS, cannot be added in an online operation. A column cannot be added online if doing so causes the maximum possible row size to exceed the 8,060 byte limit. The column is added as an offline operation in this case.
添加 NOT NULL 列作为在线操作
从 SQL Server 2012 企业版开始,当默认值为运行时常量时,添加具有默认值的 NOT NULL 列是在线操作. 这意味着无论表中的行数如何,操作几乎都是立即完成的。这是因为在操作过程中没有更新表中现有的行;相反,默认值仅存储在表的元数据中,并根据需要在访问这些行的查询中查找该值。这种行为是自动的;除了 ADD COLUMN 语法之外,不需要额外的语法来实现在线操作。运行时常量是一个表达式,它在运行时为表中的每一行生成相同的值,而不管其确定性如何。例如,常量表达式“我的临时数据”或系统函数 GETUTCDATETIME() 是运行时常量。相比之下,函数 NEWID() 或 NEWSEQUENTIALID() 不是运行时常量,因为为表中的每一行生成一个唯一值。添加具有不是运行时常量的默认值的 NOT NULL 列始终是脱机执行的,并且在操作期间获取排他 (SCH-M) 锁。
虽然现有行引用存储在元数据中的值,但默认值存储在插入的任何新行的行中,并且没有为列指定其他值。更新行时(即使实际列未在 UPDATE 语句中指定),或者重建表或聚集索引,元数据中存储的默认值将移动到现有行。
类型的列varchar(max),为nvarchar(最大),VARBINARY(最大值),XML,文本,ntext的,图像,hierarchyid的,几何,地理,或CLR UDTS,不能在网上操作增加。如果这样做会导致最大可能的行大小超过 8,060 字节限制,则无法在线添加列。在这种情况下,该列被添加为离线操作。
回答by Kuberchaun
"Add the column and then perform relatively small UPDATE batches to populate the column with a default value. That should prevent any noticeable slowdowns"
“添加该列,然后执行相对较小的 UPDATE 批次以使用默认值填充该列。这应该可以防止任何明显的减速”
And after that you have to set the column to NOT NULL which will fire off in one big transaction. So everything will run really fast until you do that so you have probably gained very little really. I only know this from first hand experience.
之后,您必须将该列设置为 NOT NULL,这将在一个大事务中触发。所以在你这样做之前,一切都会运行得非常快,所以你可能真正获得的收益很少。我只从第一手经验中知道这一点。
You might want to rename the current table from X to Y. You can do this with this command sp_RENAME '[OldTableName]' , '[NewTableName]'.
您可能希望将当前表从 X 重命名为 Y。您可以使用此命令 sp_RENAME '[OldTableName]' , '[NewTableName]' 执行此操作。
Recreate the new table as X with the new column set to NOT NULL and then batch insert from Y to X and include a default value either in your insert for the new column or placing a default value on the new column when you recreate table X.
将新表重新创建为 X 并将新列设置为 NOT NULL,然后从 Y 批量插入到 X 并在插入新列时包含默认值或在重新创建表 X 时在新列上放置默认值。
I have done this type of change on a table with hundreds of millions of rows. It still took over an hour, but it didn't blow out our trans log. When I tried to just change the column to NOT NULL with all the data in the table it took over 20 hours before I killed the process.
我在一个包含数亿行的表上进行了这种类型的更改。它仍然花了一个多小时,但它并没有炸毁我们的 trans log。当我试图用表中的所有数据将列更改为 NOT NULL 时,我花了 20 多个小时才终止该进程。
Have you tested just adding a column filling it with data and setting the column to NOT NULL?
您是否测试过仅添加一列填充数据并将该列设置为 NOT NULL?
So in the end I don't think there's a magic bullet.
所以最后我不认为有灵丹妙药。
回答by eric
select into a new table and rename. Example, Adding column i to table A:
选择到一个新表并重命名。示例,将列 i 添加到表 A:
select *, 1 as i
into A_tmp
from A_tbl
//Add any indexes here
exec sp_rename 'A_tbl', 'A_old'
exec sp_rename 'A_tmp', 'A_tbl'
Should be fast and won't touch your transaction log like inserting in batches might. (I just did this today w/ a 70 million row table in < 2 min).
应该很快并且不会像批量插入那样触及您的事务日志。(我今天刚刚在 < 2 分钟内使用 7000 万行表完成了此操作)。
You can wrap it in a transaction if you need it to be an online operation (something might change in the table between the select into and the renames).
如果您需要将它作为在线操作,您可以将它包装在一个事务中(在 select into 和重命名之间,表中的某些内容可能会发生变化)。
回答by HLGEM
Another technique is to add the column to a new related table (Assume a one-to-one relationship which you can enforce by giving the FK a unique index). You can then populate this in batches and then you can add the join to this table wherever you want the data to appear. Note I would only consider this for a column that I would not want to use in every query on the original table or if the record width of my original table was getting too large or if I was adding several columns.
另一种技术是将列添加到新的相关表中(假设您可以通过为 FK 提供唯一索引来强制执行一对一关系)。然后您可以批量填充它,然后您可以将联接添加到该表中您希望数据出现的任何位置。请注意,如果我不想在原始表的每个查询中使用的列,或者原始表的记录宽度变得太大,或者如果我添加了几列,我只会考虑这一点。