SQL Server SQL_Latin1_General_CP1_CI_AS 可以安全地转换为 Latin1_General_CI_AS 吗？

Question

提问by Kram

We have a legacy database with some (older) columns using "SQL_Latin1_General_CP1_CI_AS" and more recent changes have used "Latin1_General_CI_AS".

我们有一个旧数据库，其中一些（较旧的）列使用“SQL_Latin1_General_CP1_CI_AS”，而最近的更改使用“Latin1_General_CI_AS”。

This is a pain as joins need the additional COLLATE statement to work.

这是一个痛苦，因为连接需要额外的 COLLATE 语句才能工作。

I'd like to bring everything up to "Latin1_General_CI_AS". From what I can gather they are more or less identical collations and I won't lose data during this process...

我想把所有东西都带到“Latin1_General_CI_AS”。据我所知，它们或多或少是相同的排序规则，在此过程中我不会丢失数据......

Does anyone know if this is the case?

有谁知道是否是这种情况？

Answer 1

采纳答案by dunos

There is more info on this MSDN forum:

此 MSDN 论坛上有更多信息：

http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/196b4586-1338-434d-ba8c-49fa3c9bdeeb/

Which states:

其中指出：

You should see little difference if the collation is SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, but both have instances where they are faster or slower than the other.
Latin1_General_CI_AS :- Latin1-General, case-insensitive, accent- sensitive, kanatype-insensitive, width-insensitive
SQL_Latin1_General_CP1_CI_AS:- Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

如果排序规则是 SQL_Latin1_General_CP1_CI_AS 或 Latin1_General_CI_AS，您应该看不出有什么区别，但两者都有比另一个更快或更慢的实例。
Latin1_General_CI_AS :- Latin1-General，不区分大小写，区分重音，不区分假名，不区分宽度
SQL_Latin1_General_CP1_CI_AS：-Latin1-General，不区分大小写，区分重音，不区分假名，不区分 Unicode 数据的宽度，SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

Therefore in my opinion you shouldn't see a difference, especially if your data is only a-z0-9

因此在我看来你不应该看到差异，特别是如果你的数据只有 a-z0-9

Answer 2

回答by Zarepheth

Here is a more complete answer:

这是一个更完整的答案：

https://www.olcot.co.uk/revised-difference-between-collation-sql_latin1_general_cp1_ci_as-and-latin1_general_ci_as/

The key difference between these collations is in how they apply character expansion rules. Certain Latin characters may be expanded into multiple characters. The SQL_xxxx collations may ignore these character expansions when working with non-unicode text, but apply them for unicode text. As a result: joins, sorts, and comparisons may return different results when using one collation versus the other.

这些排序规则之间的主要区别在于它们如何应用字符扩展规则。某些拉丁字符可以扩展为多个字符。SQL_xxxx 排序规则在处理非 unicode 文本时可能会忽略这些字符扩展，但将它们应用于 unicode 文本。结果：当使用一种归类与另一种归类时，联接、排序和比较可能会返回不同的结果。

Example:

例子：

Under Latin1_General_CI_ASthese two statements return the same set of records, as ?is expanded to ss.

在Latin1_General_CI_AS这两个语句下返回相同的记录集，因为?被扩展为ss.

SELECT * FROM MyTable3 WHERE Comments = 'strasse'
SELECT * FROM MyTable3 WHERE Comments = 'stra?e'

When using SQL_Latin1_General_CP1_CI_ASthe above statements return different records, since the ?is treated as a different character than ss.

使用SQL_Latin1_General_CP1_CI_AS上述语句时返回不同的记录，因为?被视为与不同的字符ss。

Answer 3

回答by Solomon Rutzky

If you are going to change the Collation of a Database, then there is definitely stuff you should know about so that you can plan accordingly:

如果您要更改数据库的排序规则，那么您肯定应该了解一些内容，以便您可以进行相应的计划：

Regarding data-loss potential:
- NVARCHARfields are all Unicode, which is a single character set, so there can't be any data loss for these fields (this also covers XML fields which are also stored as UTF-16 Little Endian). Meta-data fields that store the object / column / index / etc names are all NVARCHARso no need to worry about those.
- VARCHARfields having different Collations but the same Code Page between the differing Collations will not be a problem since the Code Page is the character set.
- VARCHARfields having different Collations and moving to a different Code Page (when changing Collations) canhave data loss if any of the characters being used are not represented in the new Code Page. HOWEVER, this is only an issue when physically changing the Collation of a particular field (described below) and would not happen upon changing the default Collation of a database.
Local variables and string literals get their Collation from the Database default. Changing the database default will change the Collation used for both local variables and string literals. But changing the Database's default Collation does not change the Collation used for existing string columns in the tables in that Database. This generally should not cause any problems when comparing or concatenating a column with a literal and/or variable since the literals and variables will take on the Collation of the column due to Collation Precedence. The only potential problem would be Code Page conversions that might occur for characters of values between 128 - 255 that are not available in the Code Page used by the Collation of the column.
If you are expecting a predicate / comparison / sort / concatenation / etc for a column to behave differently upon changing the Database's default Collation, then you will need to explicitly change that column's Collation using the following command:
```
ALTER TABLE [{table_name}]
   ALTER COLUMN [{column_name}]
   {same_datatype}
   {same_NULL_or_NOT NULL_setting}
   COLLATE {name_of_Database_default_Collation};
```
Be sure to specify the exact samedatatype and NULL/ NOT NULLsetting that are currently being used, else they can revert to the default if not already being the default value. After that, if there are any indexes on any of the string columns that just had their Collation changed, then you need to rebuild those indexes.
Changing the Database's default Collation will change the Collation of certain database-specific meta-data, such as the namefield in both sys.objects, sys.columns, sys.indexes, etc. Filtering these system Views against local variables or string literals won't be a problem since the Collation will be changing on both sides. But, if you JOIN any of the local system Views to temporary tables on string fields, and the Database-level Collation between the local database and tempdbdoesn't match, then you will get the "Collation mismatch" error. This is discussed below along with the remedy.
One difference between these two Collations is in how they sort certain characters for VARCHARdata (this does not affect NVARCHARdata). The non-EBCDIC SQL_Collations use what is called "String Sort" for VARCHARdata, while all other Collations, and even NVARCHARdata for the non-EBCDIC SQL_Collations, use what is called "Word Sort". The difference is that in "Word Sort", the dash -and apostrophe '(and maybe a few other characters?) are given a very low weight and are essentially ignored unless there are no other differences in the strings. To see this behavior in action, run the following:
```
DECLARE @Test TABLE (Col1 VARCHAR(10) NOT NULL);
INSERT INTO @Test VALUES ('aa');
INSERT INTO @Test VALUES ('ac');
INSERT INTO @Test VALUES ('ah');
INSERT INTO @Test VALUES ('am');
INSERT INTO @Test VALUES ('aka');
INSERT INTO @Test VALUES ('akc');
INSERT INTO @Test VALUES ('ar');
INSERT INTO @Test VALUES ('a-f');
INSERT INTO @Test VALUES ('a_e');
INSERT INTO @Test VALUES ('a''kb');

SELECT * FROM @Test ORDER BY [Col1] COLLATE SQL_Latin1_General_CP1_CI_AS;
-- "String Sort" puts all punctuation ahead of letters

SELECT * FROM @Test ORDER BY [Col1] COLLATE Latin1_General_100_CI_AS;
-- "Word Sort" mostly ignores dash and apostrophe
```
Returns:
```
String Sort
-----------
a'kb
a-f
a_e
aa
ac
ah
aka
akc
am
ar
```
and:
```
Word Sort
---------
a_e
aa
ac
a-f
ah
aka
a'kb
akc
am
ar
```
While you will "lose" the "String Sort" behavior, I'm not sure that I would call that a "feature". It is a behavior that has been deemed undesirable (as evidenced by the fact that it wasn't brought forward into any of the Windows collations). However, it isa definite difference of behavior between the two collations (again, just for non-EBCDIC VARCHARdata), and you might have code and/or customer expectations based upon the "String Sort" behavior. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.
Another difference between SQL_Latin1_General_CP1_CI_ASand Latin1_General_100_CI_ASis the ability to do Expansionson VARCHARdata (NVARCHARdata can already do these for most SQL_Collations), such as handling ?as if it were ae:
```
IF ('?' COLLATE SQL_Latin1_General_CP1_CI_AS =
    'ae' COLLATE SQL_Latin1_General_CP1_CI_AS)
BEGIN
  PRINT 'SQL_Latin1_General_CP1_CI_AS';
END;

IF ('?' COLLATE Latin1_General_100_CI_AS =
    'ae' COLLATE Latin1_General_100_CI_AS)
BEGIN
  PRINT 'Latin1_General_100_CI_AS';
END;
```
Returns:
```
Latin1_General_100_CI_AS
```
The only thing you are "losing" here is notbeing able to do these expansions. Generally speaking, this is another benefit of moving to a Windows Collation. However, just like with the "String Sort" to "Word Sort" move, the same caution applies: it is a definite difference of behavior between the two collations (again, just for VARCHARdata), and you might have code and/or customer expectations based upon nothaving these mappings. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.
^{(first noted in @Zarepheth's answerand expanded on here)}
Another difference (that is also a benefit of moving to a Windows Collation) is that filtering a VARCHARcolumn that is indexed on NVARCHARliteral / variable / column you will no longer invalidate the index on the VARCHARcolumn. This is due to the Windows Collations using the same Unicode sorting and comparison rules for both VARCHARand NVARCHARdata. Because the sort order is the same between the two types, when the VARCHARdata gets converted into NVARCHAR(explicitly or implicitly due to datatype precedence), the order of items in the index is still valid. For more details on this behavior, please see my post: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types.
The server-level Collation is used to set the Collation of the system databases, which includes [model]. The [model]database is used as a template to create new databases, which includes [tempdb]upon each server startup. So, if the Database's default collation does not match the instance's default Collation andyou join local tables to temporary tables on string fields, then you will get the Collation-mismatch error. Fortunately there is a somewhat easy way to correct for collation differences between the database that is "current" when CREATE #TempTableis executed and [tempdb]. When creating temporary tables, declare a collation (on string columns) using the COLLATEclause and use either a specific collation (if you know that the DB will always be using that collation), or DATABASE_DEFAULT(if you don't always know the collation of the DB where this code will execute):
```
CREATE TABLE #Temp (Col1 NVARCHAR(40) COLLATE DATABASE_DEFAULT);
```
This is not necessary for table variables since they get their default Collation from the "current" database. However, if you have both table variables and temporary tables and join them on string fields, then you will need to use COLLATE {specific_collation}or COLLATE DATABASE_DEFAULTas shown directly above.
The server-level collation also controls local variable names, CURSORvariable names, and GOTOlabels. While none of these would be impacted by the specific change being dealt with in this Question, it is at least something to be aware of.
It is best to use the most recent version of the desired collation, if multiple versions are available. Starting in SQL Server 2005, a "90" series of collations was introduced, and SQL Server 2008 introduced a "100" series of collations. You can find these collations by using the following queries:
```
SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]90[_]%'; -- 476

SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]100[_]%'; -- 2686
```
ALSO, while the question asks about case-insensitive Collations, it should be noted that if someone else is looking to make a similar change but is using case-sensitive Collations, then another difference between SQL Server Collations and Windows Collations, for VARCHARdata only, is which case sorts first. Meaning, if you have both Aand a, the SQL_Collations will sort Abefore a, while the non-SQL_Collations (and the SQL_Collations when dealing with NVARCHARdata) will sort abefore A.

关于数据丢失的可能性：
- NVARCHAR字段都是 Unicode，这是一个单一的字符集，因此这些字段不会有任何数据丢失（这也包括同样存储为 UTF-16 Little Endian 的 XML 字段）。存储对象/列/索引/等名称的元数据字段都是NVARCHAR如此，因此无需担心这些。
- VARCHAR具有不同排序规则但不同排序规则之间具有相同代码页的字段不会有问题，因为代码页是字符集。
- VARCHAR如果使用的任何字符未在新代码页中表示，则具有不同排序规则并移动到不同代码页（更改排序规则时）的字段可能会丢失数据。但是，这只是在物理更改特定字段的排序规则（如下所述）时出现的问题，并且不会在更改数据库的默认排序规则时发生。
局部变量和字符串文字从数据库默认值中获取它们的排序规则。更改数据库默认值将更改用于局部变量和字符串文字的排序规则。但是更改数据库的默认排序规则不会更改用于该数据库表中现有字符串列的排序规则。这通常不会在将列与文字和/或变量进行比较或连接时引起任何问题，因为文字和变量将由于排序规则优先级而采用列的排序规则。唯一的潜在问题是代码页转换可能发生在 128 - 255 之间的值的字符上，这些字符在列的排序规则使用的代码页中不可用。
如果您希望在更改数据库的默认排序规则时，列的谓词/比较/排序/连接等行为会有所不同，那么您需要使用以下命令显式更改该列的排序规则：
```
ALTER TABLE [{table_name}]
   ALTER COLUMN [{column_name}]
   {same_datatype}
   {same_NULL_or_NOT NULL_setting}
   COLLATE {name_of_Database_default_Collation};
```
确保指定当前正在使用的完全相同的数据类型和NULL/NOT NULL设置，否则如果它们不是默认值，它们可以恢复为默认值。之后，如果任何字符串列上有任何索引刚刚更改了它们的排序规则，那么您需要重建这些索引。
更改数据库的默认排序规则将改变某些特定数据库的元数据的整理，如name在这两个领域sys.objects，sys.columns，sys.indexes，等过滤对局部变量或字符串文字，这些制度的意见也不会因为排序规则将是一个问题两边都在变化。但是，如果您将任何本地系统视图 JOIN 到字符串字段上的临时表，并且本地数据库之间的数据库级排序规则tempdb不匹配，那么您将收到“排序规则不匹配”错误。这将在下面与补救措施一起讨论。
这两个排序规则之间的一个区别在于它们如何对VARCHAR数据的某些字符进行排序（这不会影响NVARCHAR数据）。非 EBCDICSQL_归类对VARCHAR数据使用所谓的“字符串排序” ，而所有其他归类，甚至NVARCHAR非 EBCDICSQL_归类的数据，都使用所谓的“字排序”。不同之处在于，在“Word Sort”中，破折号-和撇号'（可能还有其他一些字符？）的权重非常低，除非字符串中没有其他差异，否则基本上会被忽略。要查看此行为的实际效果，请运行以下命令：
```
DECLARE @Test TABLE (Col1 VARCHAR(10) NOT NULL);
INSERT INTO @Test VALUES ('aa');
INSERT INTO @Test VALUES ('ac');
INSERT INTO @Test VALUES ('ah');
INSERT INTO @Test VALUES ('am');
INSERT INTO @Test VALUES ('aka');
INSERT INTO @Test VALUES ('akc');
INSERT INTO @Test VALUES ('ar');
INSERT INTO @Test VALUES ('a-f');
INSERT INTO @Test VALUES ('a_e');
INSERT INTO @Test VALUES ('a''kb');

SELECT * FROM @Test ORDER BY [Col1] COLLATE SQL_Latin1_General_CP1_CI_AS;
-- "String Sort" puts all punctuation ahead of letters

SELECT * FROM @Test ORDER BY [Col1] COLLATE Latin1_General_100_CI_AS;
-- "Word Sort" mostly ignores dash and apostrophe
```
返回：
```
String Sort
-----------
a'kb
a-f
a_e
aa
ac
ah
aka
akc
am
ar
```
和：
```
Word Sort
---------
a_e
aa
ac
a-f
ah
aka
a'kb
akc
am
ar
```
虽然您将“失去”“字符串排序”行为，但我不确定是否将其称为“功能”。这是一种被认为是不受欢迎的行为（事实证明它没有被带到任何 Windows 排序规则中）。然而，这是两个归类（再次，只是为了非EBCDIC之间的行为的一定的差异VARCHAR数据），并且可能必须基于“字符串排序”行为的代码和/或客户的期望。这需要测试您的代码并可能进行研究以查看这种行为变化是否会对用户产生任何负面影响。
之间的另一个区别SQL_Latin1_General_CP1_CI_AS和Latin1_General_100_CI_AS是做的能力展开的VARCHAR数据（NVARCHAR数据已经可以做这些对大多数SQL_排序规则），如处理?就好像它是ae：
```
IF ('?' COLLATE SQL_Latin1_General_CP1_CI_AS =
    'ae' COLLATE SQL_Latin1_General_CP1_CI_AS)
BEGIN
  PRINT 'SQL_Latin1_General_CP1_CI_AS';
END;

IF ('?' COLLATE Latin1_General_100_CI_AS =
    'ae' COLLATE Latin1_General_100_CI_AS)
BEGIN
  PRINT 'Latin1_General_100_CI_AS';
END;
```
返回：
```
Latin1_General_100_CI_AS
```
您在这里“失去”的唯一一件事就是无法进行这些扩展。一般来说，这是迁移到 Windows 排序规则的另一个好处。但是，就像从“字符串排序”到“单词排序”的移动一样，同样需要注意：这两个排序规则之间的行为存在明显差异（同样，仅针对VARCHAR数据），并且您可能有代码和/或客户基于没有这些映射的期望。这需要测试您的代码并可能进行研究以查看这种行为变化是否会对用户产生任何负面影响。
^{（首先在@Zarepheth 的回答中提到并在此处进行了扩展）}
另一个区别（这也是转移到 Windows 排序规则的一个好处）是过滤VARCHAR在NVARCHAR文字/变量/列上索引的列将不再使VARCHAR列上的索引无效。这是因为 Windows 排序规则对VARCHAR和NVARCHAR数据使用相同的 Unicode 排序和比较规则。由于两种类型之间的排序顺序相同，当VARCHAR数据被转换为NVARCHAR（由于数据类型优先级而显式或隐式）时，索引中项目的顺序仍然有效。有关此行为的更多详细信息，请参阅我的帖子：混合 VARCHAR 和 NVARCHAR 类型时对索引的影响。
服务器级 Collation 用于设置系统数据库的 Collation，其中包括[model]. 该[model]数据库用作创建新数据库的模板，包括[tempdb]在每次服务器启动时。因此，如果数据库的默认排序规则与实例的默认排序规则不匹配，并且您将本地表连接到字符串字段上的临时表，那么您将收到排序规则不匹配错误。幸运的是，有一种简单的方法可以纠正CREATE #TempTable执行时“当前”的数据库与[tempdb]. 创建临时表时，使用COLLATE子句声明排序规则（在字符串列上）并使用特定排序规则（如果您知道数据库将始终使用该排序规则），或者DATABASE_DEFAULT（如果您并不总是知道将执行此代码的数据库的排序规则）：
```
CREATE TABLE #Temp (Col1 NVARCHAR(40) COLLATE DATABASE_DEFAULT);
```
这对于表变量不是必需的，因为它们从“当前”数据库中获取默认排序规则。但是，如果您同时拥有表变量和临时表并将它们连接到字符串字段，那么您将需要使用COLLATE {specific_collation}或，COLLATE DATABASE_DEFAULT如上所示。
服务器级排序规则还控制局部变量名称、CURSOR变量名称和GOTO标签。虽然这些都不会受到本课题中正在处理的具体变化的影响，但至少需要注意一些事情。
如果有多个版本可用，最好使用所需归类的最新版本。从 SQL Server 2005 开始，引入了“90”系列的排序规则，而 SQL Server 2008 引入了“100”系列的排序规则。您可以使用以下查询找到这些排序规则：
```
SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]90[_]%'; -- 476

SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]100[_]%'; -- 2686
```
另外，虽然问题询问不区分大小写的排序规则，但应该注意的是，如果其他人希望进行类似的更改但使用区分大小写的排序规则，那么 SQL Server 排序规则和 Windows 排序规则之间的另一个区别，仅适用于VARCHAR数据，是哪种情况先排序。意思是，如果您同时拥有A和a，则SQL_排序规则将A在之前排序a，而非SQL_排序规则（以及SQL_处理NVARCHAR数据时的排序规则）将a在之前排序A。

For a lot more info and details on changing the Collation of a Database or of the entire Instance, please see my post:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?

有关更改数据库或整个实例的排序规则的更多信息和详细信息，请参阅我的帖子：
更改所有用户数据库中实例、数据库和所有列的排序规则：什么可能出错？

For more info on working with strings and collations, please visit: Collations Info

有关使用字符串和排序规则的更多信息，请访问：排序规则信息

Answer 4

回答by Will A

SELECT * FROM ::fn_helpcollations()
WHERE name IN (
'SQL_Latin1_General_CP1_CI_AS',
'Latin1_General_CI_AS'
)

...gives...

……给……

Latin1_General_CI_AS: Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive

Latin1_General_CI_AS：Latin1-General，不区分大小写，区分重音，不区分假名，不区分宽度

SQL_Latin1_General_CP1_CI_AS: Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

SQL_Latin1_General_CP1_CI_AS：Latin1-General、不区分大小写、区分重音、不区分假名、不区分宽度，对于 Unicode 数据，SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

So from this, I would infer that the code page used is the same (Latin1-General => 1252), so you shouldencounter no loss of data - if anything were to change post-conversion it might be the sort order - which is probably immaterial.

因此，由此，我推断所使用的代码页是相同的（Latin1-General => 1252），因此您应该不会丢失数据 - 如果转换后有任何更改，则可能是排序顺序 - 即可能无关紧要。

Answer 5

回答by user2728409

To do that go to properties of your data base and select options.

为此，请转到数据库的属性并选择选项。

Then change the collection type to SQL_Latin1_General_CP1_CS_AS.

然后将集合类型更改为 SQL_Latin1_General_CP1_CS_AS。

SQL Server SQL_Latin1_General_CP1_CI_AS 可以安全地转换为 Latin1_General_CI_AS 吗？

提问by Kram

采纳答案by dunos

回答by Zarepheth

回答by Solomon Rutzky

回答by Will A

回答by user2728409

相关推荐

最近更新

标签

SQL Server SQL_Latin1_General_CP1_CI_AS 可以安全地转换为 Latin1_General_CI_AS 吗？

提问by Kram

采纳答案by dunos

回答by Zarepheth

回答by Solomon Rutzky

回答by Will A

回答by user2728409

相关推荐

SQL oracle：解码和子查询选择结果

如何在 SQL 中查找最大值及其关联的字段值？

SQL 从存储过程结果集中插入/更新表上的数据

SQL 如何使用“Partition By”或“Max”？

相关推荐

最近更新

标签