C# InvariantCulture 和 Ordinal 字符串比较之间的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/492799/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between InvariantCulture and Ordinal string comparison
提问by Kapil
When comparing two strings in c# for equality, what is the difference between InvariantCulture and Ordinal comparison?
在 c# 中比较两个字符串是否相等时,InvariantCulture 和 Ordinal 比较有什么区别?
采纳答案by JaredReisinger
InvariantCulture
不变文化
Uses a "standard" set of character orderings (a,b,c, ... etc.). This is in contrast to some specific locales, which may sort characters in different orders ('a-with-acute' may be before orafter 'a', depending on the locale, and so on).
使用“标准”字符排序集(a、b、c、...等)。这与某些特定的语言环境形成对比,它们可能以不同的顺序对字符进行排序('a-with-acute' 可能在 'a'之前或之后,取决于语言环境,等等)。
Ordinal
序数
On the other hand, looks purely at the values of the raw byte(s) that represent the character.
另一方面,纯粹查看代表字符的原始字节的值。
There's a great sample at http://msdn.microsoft.com/en-us/library/e6883c06.aspxthat shows the results of the various StringComparison values. All the way at the end, it shows (excerpted):
http://msdn.microsoft.com/en-us/library/e6883c06.aspx上有一个很好的示例,其中显示了各种 StringComparison 值的结果。一路走到最后,显示(摘录):
StringComparison.InvariantCulture:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is less than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)
StringComparison.Ordinal:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is greater than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)
You can see that where InvariantCulture yields (U+0069, U+0049, U+00131), Ordinal yields (U+0049, U+0069, U+00131).
您可以看到 InvariantCulture 产生 (U+0069, U+0049, U+00131),Ordinal 产生 (U+0049, U+0069, U+00131)。
回答by Rob Parker
Another handy difference (in English where accents are uncommon) is that an InvariantCulture comparison compares the entire strings by case-insensitive first, and then if necessary (and requested) distinguishes by case after first comparing only on the distinct letters. (You can also do a case-insensitive comparison, of course, which won't distinguish by case.) Corrected:Accented letters are considered to be another flavor of the same letters and the string is compared first ignoring accents and then accounting for them if the general letters all match (much as with differing case except not ultimately ignored in a case-insensitive compare). This groups accented versions of the otherwise same word near each other instead of completely separate at the first accent difference. This is the sort order you would typically find in a dictionary, with capitalized words appearing right next to their lowercase equivalents, and accented letters being near the corresponding unaccented letter.
另一个方便的区别(在英语中,重音不常见)是 InvariantCulture 比较首先通过不区分大小写的方式比较整个字符串,然后如果需要(和要求)在首先比较不同的字母后按大小写进行区分。(当然,您也可以进行不区分大小写的比较,不会区分大小写。) 更正:重音字母被认为是相同字母的另一种风格,并且首先比较字符串时会忽略重音,然后在一般字母都匹配时考虑重音(与大小写不同,除非最终在不区分大小写的比较中不被忽略)。这将相同单词的重音版本彼此分组在一起,而不是在第一个重音差异处完全分开。这是您通常会在字典中找到的排序顺序,大写单词紧挨着它们的小写字母旁边,重音字母靠近相应的非重音字母。
An ordinal comparison compares strictly on the numeric character values, stopping at the first difference. This sorts capitalized letters completely separate from the lowercase letters (and accented letters presumably separate from those), so capitalized words would sort nowhere near their lowercase equivalents.
序数比较严格比较数字字符值,在第一个差异处停止。这将大写字母与小写字母完全分开(并且带重音的字母可能与这些字母分开),因此大写单词的排序远不及它们的小写字母。
InvariantCulture also considers capitals to be greater than lower case, whereas Ordinal considers capitals to be less than lowercase (a holdover of ASCII from the old days before computers had lowercase letters, the uppercase letters were allocated first and thus had lower values than the lowercase letters added later).
InvariantCulture 还认为大写字母大于小写字母,而 Ordinal 认为大写字母小于小写字母(计算机有小写字母之前的旧时代的 ASCII 的保留,大写字母首先分配,因此其值低于小写字母稍后补充)。
For example, by Ordinal: "0" < "9" < "A" < "Ab" < "Z" < "a" < "aB" < "ab" < "z" < "á" < "áb" < "á" < "áb"
例如,按序数: "0" < "9" < "A" < "Ab" < "Z" < "a" < "aB" < "ab" < "z" < "á" < "áb" < "á" < "áb"
And by InvariantCulture: "0" < "9" < "a" < "A" < "á" < "á" < "ab" < "aB" < "Ab" < "áb" < "áb" < "z" < "Z"
通过 InvariantCulture: "0" < "9" < "a" < "A" < "á" < "á" < "ab" < "aB" < "Ab" < "áb" < "áb" < "z" < "Z"
回答by George
Always try to use InvariantCulture in those string methods that accept it as overload. By using InvariantCulture you are on a safe side. Many .NET programmers may not use this functionality but if your software will be used by different cultures, InvariantCulture is an extremely handy feature.
总是尝试在那些接受它作为重载的字符串方法中使用 InvariantCulture。通过使用 InvariantCulture,您就安全了。许多 .NET 程序员可能不会使用此功能,但如果您的软件将被不同的文化使用,则 InvariantCulture 是一个非常方便的功能。
回答by DanH
Invariant is a linguistically appropriate type of comparison.
Ordinal is a binary type of comparison. (faster)
See http://www.siao2.com/2004/12/29/344136.aspx
不变量是一种语言上合适的比较类型。
序数是一种二进制类型的比较。(更快)
见http://www.siao2.com/2004/12/29/344136.aspx
回答by Ventsyslav Raikov
It does matter, for example - there is a thing called character expansion
例如,这确实很重要 - 有一种叫做字符扩展的东西
var s1 = "Strasse";
var s2 = "Stra?e";
s1.Equals(s2, StringComparison.Ordinal); //false
s1.Equals(s2, StringComparison.InvariantCulture); //true
With InvariantCulture
the ? character gets expanded to ss.
随着InvariantCulture
?字符被扩展为 ss。
回答by Dariusz
Pointing to Best Practices for Using Strings in the .NET Framework:
指向在 .NET Framework 中使用字符串的最佳实践:
- Use
StringComparison.Ordinal
orStringComparison.OrdinalIgnoreCase
for comparisons as your safe default for culture-agnostic string matching. - Use comparisons with
StringComparison.Ordinal
orStringComparison.OrdinalIgnoreCase
for better performance. - Use the non-linguistic
StringComparison.Ordinal
orStringComparison.OrdinalIgnoreCase
values instead of string operations based onCultureInfo.InvariantCulture
when the comparison is linguistically irrelevant (symbolic, for example).
- 使用
StringComparison.Ordinal
或StringComparison.OrdinalIgnoreCase
进行比较作为与文化无关的字符串匹配的安全默认值。 - 使用与
StringComparison.Ordinal
或StringComparison.OrdinalIgnoreCase
进行比较以获得更好的性能。 - 根据比较何时在语言上无关(例如,符号),使用非语言
StringComparison.Ordinal
或StringComparison.OrdinalIgnoreCase
值而不是字符串操作CultureInfo.InvariantCulture
。
And finally:
最后:
- Do not use string operations based on
StringComparison.InvariantCulture
in most cases. One of the few exceptions is when you are persisting linguistically meaningful but culturally agnostic data.
StringComparison.InvariantCulture
大多数情况下不要使用基于 的字符串操作。为数不多的例外之一是,当您坚持保留在语言上有意义但在文化上不可知的数据时。
回答by Eugene Beresovsky
Although the question is about equality, for quick visual reference, here the order of some strings sortedusing a couple of cultures illustrating some of the idiosyncrasies out there.
尽管问题是关于相等性,但为了快速视觉参考,这里使用几种文化排序的一些字符串的顺序说明了那里的一些特质。
Ordinal 0 9 A Ab a aB aa ab ss ? ?b ? ? ?b ぁ あ ァ ア 亜 A
IgnoreCase 0 9 a A aa ab Ab aB ss ? ? ?b ?b ? ぁ あ ァ ア 亜 A
--------------------------------------------------------------------
InvariantCulture 0 9 a A A ? ? aa ab aB Ab ?b ?b ss ? ァ ぁ ア あ 亜
IgnoreCase 0 9 A a A ? ? aa Ab aB ab ?b ?b ? ss ァ ぁ ア あ 亜
--------------------------------------------------------------------
da-DK 0 9 a A A ab aB Ab ss ? ? ? ?b ?b aa ァ ぁ ア あ 亜
IgnoreCase 0 9 A a A Ab aB ab ? ss ? ? ?b ?b aa ァ ぁ ア あ 亜
--------------------------------------------------------------------
de-DE 0 9 a A A ? ? aa ab aB Ab ?b ?b ? ss ァ ぁ ア あ 亜
IgnoreCase 0 9 A a A ? ? aa Ab aB ab ?b ?b ss ? ァ ぁ ア あ 亜
--------------------------------------------------------------------
en-US 0 9 a A A ? ? aa ab aB Ab ?b ?b ? ss ァ ぁ ア あ 亜
IgnoreCase 0 9 A a A ? ? aa Ab aB ab ?b ?b ss ? ァ ぁ ア あ 亜
--------------------------------------------------------------------
ja-JP 0 9 a A A ? ? aa ab aB Ab ?b ?b ? ss ァ ぁ ア あ 亜
IgnoreCase 0 9 A a A ? ? aa Ab aB ab ?b ?b ss ? ァ ぁ ア あ 亜
Observations:
观察:
de-DE
,ja-JP
, anden-US
sort the same wayInvariant
only sortsss
and?
differently from the above three culturesda-DK
sorts quite differently- the
IgnoreCase
flag matters for all sampled cultures
de-DE
,ja-JP
, 并en-US
以同样的方式排序Invariant
只是ss
与?
上述三种文化不同da-DK
排序完全不同- 在
IgnoreCase
所有抽样文化标志事项
The code used to generate above table:
用于生成上表的代码:
var l = new List<string>
{ "0", "9", "A", "Ab", "a", "aB", "aa", "ab", "ss", "?",
"?", "?b", "?", "?b", "あ", "ぁ", "ア", "ァ", "A", "亜" };
foreach (var comparer in new[]
{
StringComparer.Ordinal,
StringComparer.OrdinalIgnoreCase,
StringComparer.InvariantCulture,
StringComparer.InvariantCultureIgnoreCase,
StringComparer.Create(new CultureInfo("da-DK"), false),
StringComparer.Create(new CultureInfo("da-DK"), true),
StringComparer.Create(new CultureInfo("de-DE"), false),
StringComparer.Create(new CultureInfo("de-DE"), true),
StringComparer.Create(new CultureInfo("en-US"), false),
StringComparer.Create(new CultureInfo("en-US"), true),
StringComparer.Create(new CultureInfo("ja-JP"), false),
StringComparer.Create(new CultureInfo("ja-JP"), true),
})
{
l.Sort(comparer);
Console.WriteLine(string.Join(" ", l));
}
回答by Dwedit
Here is an example where string equality comparison using InvariantCultureIgnoreCase and OrdinalIgnoreCase will not give the same results:
这是一个示例,其中使用 InvariantCultureIgnoreCase 和 OrdinalIgnoreCase 进行字符串相等比较不会给出相同的结果:
string str = "\xC4"; //A with umlaut, ?
string A = str.Normalize(NormalizationForm.FormC);
//Length is 1, this will contain the single A with umlaut character (?)
string B = str.Normalize(NormalizationForm.FormD);
//Length is 2, this will contain an uppercase A followed by an umlaut combining character
bool equals1 = A.Equals(B, StringComparison.OrdinalIgnoreCase);
bool equals2 = A.Equals(B, StringComparison.InvariantCultureIgnoreCase);
If you run this, equals1 will be false, and equals2 will be true.
如果你运行这个,equals1 将为假,equals2 为真。
回答by KFL
No need to use fancy unicode char exmaples to show the difference. Here's one simple example I found out today which is surprising, consisting of only ASCII characters.
无需使用花哨的 unicode 字符示例来显示差异。这是我今天发现的一个简单示例,它令人惊讶,仅由 ASCII 字符组成。
According to the ASCII table, 0
(0x48) is smaller than _
(0x95) when compared ordinally. InvariantCulture would say the opposite (PowerShell code below):
根据ASCII表,按顺序比较,0
(0x48)小于_
(0x95)。InvariantCulture 会说相反的(下面的 PowerShell 代码):
PS> [System.StringComparer]::Ordinal.Compare("_", "0")
47
PS> [System.StringComparer]::InvariantCulture.Compare("_", "0")
-1