SQL Server SQL_Latin1_General_CP1_CI_AS 可以安全地转换为 Latin1_General_CI_AS 吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6296936/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can SQL Server SQL_Latin1_General_CP1_CI_AS be safely converted to Latin1_General_CI_AS?
提问by Kram
We have a legacy database with some (older) columns using "SQL_Latin1_General_CP1_CI_AS" and more recent changes have used "Latin1_General_CI_AS".
我们有一个旧数据库,其中一些(较旧的)列使用“SQL_Latin1_General_CP1_CI_AS”,而最近的更改使用“Latin1_General_CI_AS”。
This is a pain as joins need the additional COLLATE statement to work.
这是一个痛苦,因为连接需要额外的 COLLATE 语句才能工作。
I'd like to bring everything up to "Latin1_General_CI_AS". From what I can gather they are more or less identical collations and I won't lose data during this process...
我想把所有东西都带到“Latin1_General_CI_AS”。据我所知,它们或多或少是相同的排序规则,在此过程中我不会丢失数据......
Does anyone know if this is the case?
有谁知道是否是这种情况?
采纳答案by dunos
There is more info on this MSDN forum:
此 MSDN 论坛上有更多信息:
Which states:
其中指出:
You should see little difference if the collation is SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, but both have instances where they are faster or slower than the other.
Latin1_General_CI_AS :- Latin1-General, case-insensitive, accent- sensitive, kanatype-insensitive, width-insensitive
SQL_Latin1_General_CP1_CI_AS:- Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data
如果排序规则是 SQL_Latin1_General_CP1_CI_AS 或 Latin1_General_CI_AS,您应该看不出有什么区别,但两者都有比另一个更快或更慢的实例。
Latin1_General_CI_AS :- Latin1-General,不区分大小写,区分重音,不区分假名,不区分宽度
SQL_Latin1_General_CP1_CI_AS:-Latin1-General,不区分大小写,区分重音,不区分假名,不区分 Unicode 数据的宽度,SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data
Therefore in my opinion you shouldn't see a difference, especially if your data is only a-z0-9
因此在我看来你不应该看到差异,特别是如果你的数据只有 a-z0-9
回答by Zarepheth
Here is a more complete answer:
这是一个更完整的答案:
The key difference between these collations is in how they apply character expansion rules. Certain Latin characters may be expanded into multiple characters. The SQL_xxxx collations may ignore these character expansions when working with non-unicode text, but apply them for unicode text. As a result: joins, sorts, and comparisons may return different results when using one collation versus the other.
这些排序规则之间的主要区别在于它们如何应用字符扩展规则。某些拉丁字符可以扩展为多个字符。SQL_xxxx 排序规则在处理非 unicode 文本时可能会忽略这些字符扩展,但将它们应用于 unicode 文本。结果:当使用一种归类与另一种归类时,联接、排序和比较可能会返回不同的结果。
Example:
例子:
Under Latin1_General_CI_AS
these two statements return the same set of records, as ?
is expanded to ss
.
在Latin1_General_CI_AS
这两个语句下返回相同的记录集,因为?
被扩展为ss
.
SELECT * FROM MyTable3 WHERE Comments = 'strasse'
SELECT * FROM MyTable3 WHERE Comments = 'stra?e'
When using SQL_Latin1_General_CP1_CI_AS
the above statements return different records, since the ?
is treated as a different character than ss
.
使用SQL_Latin1_General_CP1_CI_AS
上述语句时返回不同的记录,因为?
被视为与 不同的字符ss
。
回答by Solomon Rutzky
If you are going to change the Collation of a Database, then there is definitely stuff you should know about so that you can plan accordingly:
如果您要更改数据库的排序规则,那么您肯定应该了解一些内容,以便您可以进行相应的计划:
Regarding data-loss potential:
NVARCHAR
fields are all Unicode, which is a single character set, so there can't be any data loss for these fields (this also covers XML fields which are also stored as UTF-16 Little Endian). Meta-data fields that store the object / column / index / etc names are allNVARCHAR
so no need to worry about those.VARCHAR
fields having different Collations but the same Code Page between the differing Collations will not be a problem since the Code Page is the character set.VARCHAR
fields having different Collations and moving to a different Code Page (when changing Collations) canhave data loss if any of the characters being used are not represented in the new Code Page. HOWEVER, this is only an issue when physically changing the Collation of a particular field (described below) and would not happen upon changing the default Collation of a database.
Local variables and string literals get their Collation from the Database default. Changing the database default will change the Collation used for both local variables and string literals. But changing the Database's default Collation does not change the Collation used for existing string columns in the tables in that Database. This generally should not cause any problems when comparing or concatenating a column with a literal and/or variable since the literals and variables will take on the Collation of the column due to Collation Precedence. The only potential problem would be Code Page conversions that might occur for characters of values between 128 - 255 that are not available in the Code Page used by the Collation of the column.
If you are expecting a predicate / comparison / sort / concatenation / etc for a column to behave differently upon changing the Database's default Collation, then you will need to explicitly change that column's Collation using the following command:
ALTER TABLE [{table_name}] ALTER COLUMN [{column_name}] {same_datatype} {same_NULL_or_NOT NULL_setting} COLLATE {name_of_Database_default_Collation};
Be sure to specify the exact samedatatype and
NULL
/NOT NULL
setting that are currently being used, else they can revert to the default if not already being the default value. After that, if there are any indexes on any of the string columns that just had their Collation changed, then you need to rebuild those indexes.Changing the Database's default Collation will change the Collation of certain database-specific meta-data, such as the
name
field in bothsys.objects
,sys.columns
,sys.indexes
, etc. Filtering these system Views against local variables or string literals won't be a problem since the Collation will be changing on both sides. But, if you JOIN any of the local system Views to temporary tables on string fields, and the Database-level Collation between the local database andtempdb
doesn't match, then you will get the "Collation mismatch" error. This is discussed below along with the remedy.One difference between these two Collations is in how they sort certain characters for
VARCHAR
data (this does not affectNVARCHAR
data). The non-EBCDICSQL_
Collations use what is called "String Sort" forVARCHAR
data, while all other Collations, and evenNVARCHAR
data for the non-EBCDICSQL_
Collations, use what is called "Word Sort". The difference is that in "Word Sort", the dash-
and apostrophe'
(and maybe a few other characters?) are given a very low weight and are essentially ignored unless there are no other differences in the strings. To see this behavior in action, run the following:DECLARE @Test TABLE (Col1 VARCHAR(10) NOT NULL); INSERT INTO @Test VALUES ('aa'); INSERT INTO @Test VALUES ('ac'); INSERT INTO @Test VALUES ('ah'); INSERT INTO @Test VALUES ('am'); INSERT INTO @Test VALUES ('aka'); INSERT INTO @Test VALUES ('akc'); INSERT INTO @Test VALUES ('ar'); INSERT INTO @Test VALUES ('a-f'); INSERT INTO @Test VALUES ('a_e'); INSERT INTO @Test VALUES ('a''kb'); SELECT * FROM @Test ORDER BY [Col1] COLLATE SQL_Latin1_General_CP1_CI_AS; -- "String Sort" puts all punctuation ahead of letters SELECT * FROM @Test ORDER BY [Col1] COLLATE Latin1_General_100_CI_AS; -- "Word Sort" mostly ignores dash and apostrophe
Returns:
String Sort ----------- a'kb a-f a_e aa ac ah aka akc am ar
and:
Word Sort --------- a_e aa ac a-f ah aka a'kb akc am ar
While you will "lose" the "String Sort" behavior, I'm not sure that I would call that a "feature". It is a behavior that has been deemed undesirable (as evidenced by the fact that it wasn't brought forward into any of the Windows collations). However, it isa definite difference of behavior between the two collations (again, just for non-EBCDIC
VARCHAR
data), and you might have code and/or customer expectations based upon the "String Sort" behavior. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.Another difference between
SQL_Latin1_General_CP1_CI_AS
andLatin1_General_100_CI_AS
is the ability to do ExpansionsonVARCHAR
data (NVARCHAR
data can already do these for mostSQL_
Collations), such as handling?
as if it wereae
:IF ('?' COLLATE SQL_Latin1_General_CP1_CI_AS = 'ae' COLLATE SQL_Latin1_General_CP1_CI_AS) BEGIN PRINT 'SQL_Latin1_General_CP1_CI_AS'; END; IF ('?' COLLATE Latin1_General_100_CI_AS = 'ae' COLLATE Latin1_General_100_CI_AS) BEGIN PRINT 'Latin1_General_100_CI_AS'; END;
Returns:
Latin1_General_100_CI_AS
The only thing you are "losing" here is notbeing able to do these expansions. Generally speaking, this is another benefit of moving to a Windows Collation. However, just like with the "String Sort" to "Word Sort" move, the same caution applies: it is a definite difference of behavior between the two collations (again, just for
VARCHAR
data), and you might have code and/or customer expectations based upon nothaving these mappings. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.(first noted in @Zarepheth's answerand expanded on here)
Another difference (that is also a benefit of moving to a Windows Collation) is that filtering a
VARCHAR
column that is indexed onNVARCHAR
literal / variable / column you will no longer invalidate the index on theVARCHAR
column. This is due to the Windows Collations using the same Unicode sorting and comparison rules for bothVARCHAR
andNVARCHAR
data. Because the sort order is the same between the two types, when theVARCHAR
data gets converted intoNVARCHAR
(explicitly or implicitly due to datatype precedence), the order of items in the index is still valid. For more details on this behavior, please see my post: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types.The server-level Collation is used to set the Collation of the system databases, which includes
[model]
. The[model]
database is used as a template to create new databases, which includes[tempdb]
upon each server startup. So, if the Database's default collation does not match the instance's default Collation andyou join local tables to temporary tables on string fields, then you will get the Collation-mismatch error. Fortunately there is a somewhat easy way to correct for collation differences between the database that is "current" whenCREATE #TempTable
is executed and[tempdb]
. When creating temporary tables, declare a collation (on string columns) using theCOLLATE
clause and use either a specific collation (if you know that the DB will always be using that collation), orDATABASE_DEFAULT
(if you don't always know the collation of the DB where this code will execute):CREATE TABLE #Temp (Col1 NVARCHAR(40) COLLATE DATABASE_DEFAULT);
This is not necessary for table variables since they get their default Collation from the "current" database. However, if you have both table variables and temporary tables and join them on string fields, then you will need to use
COLLATE {specific_collation}
orCOLLATE DATABASE_DEFAULT
as shown directly above.The server-level collation also controls local variable names,
CURSOR
variable names, andGOTO
labels. While none of these would be impacted by the specific change being dealt with in this Question, it is at least something to be aware of.It is best to use the most recent version of the desired collation, if multiple versions are available. Starting in SQL Server 2005, a "90" series of collations was introduced, and SQL Server 2008 introduced a "100" series of collations. You can find these collations by using the following queries:
SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]90[_]%'; -- 476 SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]100[_]%'; -- 2686
ALSO, while the question asks about case-insensitive Collations, it should be noted that if someone else is looking to make a similar change but is using case-sensitive Collations, then another difference between SQL Server Collations and Windows Collations, for
VARCHAR
data only, is which case sorts first. Meaning, if you have bothA
anda
, theSQL_
Collations will sortA
beforea
, while the non-SQL_
Collations (and theSQL_
Collations when dealing withNVARCHAR
data) will sorta
beforeA
.
关于数据丢失的可能性:
NVARCHAR
字段都是 Unicode,这是一个单一的字符集,因此这些字段不会有任何数据丢失(这也包括同样存储为 UTF-16 Little Endian 的 XML 字段)。存储对象/列/索引/等名称的元数据字段都是NVARCHAR
如此,因此无需担心这些。VARCHAR
具有不同排序规则但不同排序规则之间具有相同代码页的字段不会有问题,因为代码页是字符集。VARCHAR
如果使用的任何字符未在新代码页中表示,则具有不同排序规则并移动到不同代码页(更改排序规则时)的字段可能会丢失数据。但是,这只是在物理更改特定字段的排序规则(如下所述)时出现的问题,并且不会在更改数据库的默认排序规则时发生。
局部变量和字符串文字从数据库默认值中获取它们的排序规则。更改数据库默认值将更改用于局部变量和字符串文字的排序规则。但是更改数据库的默认排序规则不会更改用于该数据库表中现有字符串列的排序规则。这通常不会在将列与文字和/或变量进行比较或连接时引起任何问题,因为文字和变量将由于排序规则优先级而采用列的排序规则。唯一的潜在问题是代码页转换可能发生在 128 - 255 之间的值的字符上,这些字符在列的排序规则使用的代码页中不可用。
如果您希望在更改数据库的默认排序规则时,列的谓词/比较/排序/连接等行为会有所不同,那么您需要使用以下命令显式更改该列的排序规则:
ALTER TABLE [{table_name}] ALTER COLUMN [{column_name}] {same_datatype} {same_NULL_or_NOT NULL_setting} COLLATE {name_of_Database_default_Collation};
确保指定当前正在使用的完全相同的数据类型和
NULL
/NOT NULL
设置,否则如果它们不是默认值,它们可以恢复为默认值。之后,如果任何字符串列上有任何索引刚刚更改了它们的排序规则,那么您需要重建这些索引。更改数据库的默认排序规则将改变某些特定数据库的元数据的整理,如
name
在这两个领域sys.objects
,sys.columns
,sys.indexes
,等过滤对局部变量或字符串文字,这些制度的意见也不会因为排序规则将是一个问题两边都在变化。但是,如果您将任何本地系统视图 JOIN 到字符串字段上的临时表,并且本地数据库之间的数据库级排序规则tempdb
不匹配,那么您将收到“排序规则不匹配”错误。这将在下面与补救措施一起讨论。这两个排序规则之间的一个区别在于它们如何对
VARCHAR
数据的某些字符进行排序(这不会影响NVARCHAR
数据)。非 EBCDICSQL_
归类对VARCHAR
数据使用所谓的“字符串排序” ,而所有其他归类,甚至NVARCHAR
非 EBCDICSQL_
归类的数据,都使用所谓的“字排序”。不同之处在于,在“Word Sort”中,破折号-
和撇号'
(可能还有其他一些字符?)的权重非常低,除非字符串中没有其他差异,否则基本上会被忽略。要查看此行为的实际效果,请运行以下命令:DECLARE @Test TABLE (Col1 VARCHAR(10) NOT NULL); INSERT INTO @Test VALUES ('aa'); INSERT INTO @Test VALUES ('ac'); INSERT INTO @Test VALUES ('ah'); INSERT INTO @Test VALUES ('am'); INSERT INTO @Test VALUES ('aka'); INSERT INTO @Test VALUES ('akc'); INSERT INTO @Test VALUES ('ar'); INSERT INTO @Test VALUES ('a-f'); INSERT INTO @Test VALUES ('a_e'); INSERT INTO @Test VALUES ('a''kb'); SELECT * FROM @Test ORDER BY [Col1] COLLATE SQL_Latin1_General_CP1_CI_AS; -- "String Sort" puts all punctuation ahead of letters SELECT * FROM @Test ORDER BY [Col1] COLLATE Latin1_General_100_CI_AS; -- "Word Sort" mostly ignores dash and apostrophe
返回:
String Sort ----------- a'kb a-f a_e aa ac ah aka akc am ar
和:
Word Sort --------- a_e aa ac a-f ah aka a'kb akc am ar
虽然您将“失去”“字符串排序”行为,但我不确定是否将其称为“功能”。这是一种被认为是不受欢迎的行为(事实证明它没有被带到任何 Windows 排序规则中)。然而,这是两个归类(再次,只是为了非EBCDIC之间的行为的一定的差异
VARCHAR
数据),并且可能必须基于“字符串排序”行为的代码和/或客户的期望。这需要测试您的代码并可能进行研究以查看这种行为变化是否会对用户产生任何负面影响。之间的另一个区别
SQL_Latin1_General_CP1_CI_AS
和Latin1_General_100_CI_AS
是做的能力展开的VARCHAR
数据(NVARCHAR
数据已经可以做这些对大多数SQL_
排序规则),如处理?
就好像它是ae
:IF ('?' COLLATE SQL_Latin1_General_CP1_CI_AS = 'ae' COLLATE SQL_Latin1_General_CP1_CI_AS) BEGIN PRINT 'SQL_Latin1_General_CP1_CI_AS'; END; IF ('?' COLLATE Latin1_General_100_CI_AS = 'ae' COLLATE Latin1_General_100_CI_AS) BEGIN PRINT 'Latin1_General_100_CI_AS'; END;
返回:
Latin1_General_100_CI_AS
您在这里“失去”的唯一一件事就是无法进行这些扩展。一般来说,这是迁移到 Windows 排序规则的另一个好处。但是,就像从“字符串排序”到“单词排序”的移动一样,同样需要注意:这两个排序规则之间的行为存在明显差异(同样,仅针对
VARCHAR
数据),并且您可能有代码和/或客户基于没有这些映射的期望。这需要测试您的代码并可能进行研究以查看这种行为变化是否会对用户产生任何负面影响。(首先在@Zarepheth 的回答中提到并在此处进行了扩展)
另一个区别(这也是转移到 Windows 排序规则的一个好处)是过滤
VARCHAR
在NVARCHAR
文字/变量/列上索引的列将不再使VARCHAR
列上的索引无效。这是因为 Windows 排序规则对VARCHAR
和NVARCHAR
数据使用相同的 Unicode 排序和比较规则。由于两种类型之间的排序顺序相同,当VARCHAR
数据被转换为NVARCHAR
(由于数据类型优先级而显式或隐式)时,索引中项目的顺序仍然有效。有关此行为的更多详细信息,请参阅我的帖子:混合 VARCHAR 和 NVARCHAR 类型时对索引的影响。服务器级 Collation 用于设置系统数据库的 Collation,其中包括
[model]
. 该[model]
数据库用作创建新数据库的模板,包括[tempdb]
在每次服务器启动时。因此,如果数据库的默认排序规则与实例的默认排序规则不匹配,并且您将本地表连接到字符串字段上的临时表,那么您将收到排序规则不匹配错误。幸运的是,有一种简单的方法可以纠正CREATE #TempTable
执行时“当前”的数据库与[tempdb]
. 创建临时表时,使用COLLATE
子句声明排序规则(在字符串列上)并使用特定排序规则(如果您知道数据库将始终使用该排序规则),或者DATABASE_DEFAULT
(如果您并不总是知道将执行此代码的数据库的排序规则):CREATE TABLE #Temp (Col1 NVARCHAR(40) COLLATE DATABASE_DEFAULT);
这对于表变量不是必需的,因为它们从“当前”数据库中获取默认排序规则。但是,如果您同时拥有表变量和临时表并将它们连接到字符串字段,那么您将需要使用
COLLATE {specific_collation}
或 ,COLLATE DATABASE_DEFAULT
如上所示。服务器级排序规则还控制局部变量名称、
CURSOR
变量名称和GOTO
标签。虽然这些都不会受到本课题中正在处理的具体变化的影响,但至少需要注意一些事情。如果有多个版本可用,最好使用所需归类的最新版本。从 SQL Server 2005 开始,引入了“90”系列的排序规则,而 SQL Server 2008 引入了“100”系列的排序规则。您可以使用以下查询找到这些排序规则:
SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]90[_]%'; -- 476 SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]100[_]%'; -- 2686
另外,虽然问题询问不区分大小写的排序规则,但应该注意的是,如果其他人希望进行类似的更改但使用区分大小写的排序规则,那么 SQL Server 排序规则和 Windows 排序规则之间的另一个区别,仅适用于
VARCHAR
数据,是哪种情况先排序。意思是,如果您同时拥有A
和a
,则SQL_
排序规则将A
在 之前排序a
,而非SQL_
排序规则(以及SQL_
处理NVARCHAR
数据时的排序规则)将a
在 之前排序A
。
For a lot more info and details on changing the Collation of a Database or of the entire Instance, please see my post:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
有关更改数据库或整个实例的排序规则的更多信息和详细信息,请参阅我的帖子:
更改所有用户数据库中实例、数据库和所有列的排序规则:什么可能出错?
For more info on working with strings and collations, please visit: Collations Info
回答by Will A
SELECT * FROM ::fn_helpcollations()
WHERE name IN (
'SQL_Latin1_General_CP1_CI_AS',
'Latin1_General_CI_AS'
)
...gives...
……给……
Latin1_General_CI_AS: Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Latin1_General_CI_AS:Latin1-General,不区分大小写,区分重音,不区分假名,不区分宽度
SQL_Latin1_General_CP1_CI_AS: Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data
SQL_Latin1_General_CP1_CI_AS:Latin1-General、不区分大小写、区分重音、不区分假名、不区分宽度,对于 Unicode 数据,SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data
So from this, I would infer that the code page used is the same (Latin1-General => 1252), so you shouldencounter no loss of data - if anything were to change post-conversion it might be the sort order - which is probably immaterial.
因此,由此,我推断所使用的代码页是相同的(Latin1-General => 1252),因此您应该不会丢失数据 - 如果转换后有任何更改,则可能是排序顺序 - 即可能无关紧要。