C# 当字符串保证不会改变时,字符串比较真的会因文化而不同吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10941375/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Could string comparisons really differ based on culture when the string is guaranteed not to change?
提问by B. Clay Shannon
I'm reading encrypted credentials/connection strings from a config file. Resharper tells me, "String.IndexOf(string) is culture-specific here" on this line:
我正在从配置文件中读取加密的凭据/连接字符串。Resharper 在这一行告诉我,“String.IndexOf(string) 在这里是特定于文化的”:
if (line.Contains("host=")) {
_host = line.Substring(line.IndexOf(
"host=") + "host=".Length, line.Length - "host=".Length);
...and so wants to change it to:
...所以想把它改成:
if (line.Contains("host=")) {
_host = line.Substring(line.IndexOf("host=", System.StringComparison.Ordinal) + "host=".Length, line.Length - "host=".Length);
The value I'm reading will always be "host=" regardless of where the app may be deployed. Is it really sensible to add this "System.StringComparison.Ordinal" bit?
无论应用程序部署在哪里,我正在读取的值将始终为“host=”。添加这个“System.StringComparison.Ordinal”位真的明智吗?
More importantly, could it hurt anything (to use it)?
更重要的是,它会伤害任何东西(使用它)吗?
采纳答案by Mark Sowul
Absolutely. Per MSDN (http://msdn.microsoft.com/en-us/library/d93tkzah.aspx),
绝对地。根据 MSDN ( http://msdn.microsoft.com/en-us/library/d93tkzah.aspx),
This method performs a word (case-sensitive and culture-sensitive) search using the current culture.
此方法使用当前区域性执行单词(区分大小写和区分区域性)搜索。
So you may get different results if you run it under a different culture (via regional and language settings in Control Panel).
因此,如果您在不同的文化下运行它(通过控制面板中的区域和语言设置),您可能会得到不同的结果。
In this particular case, you probably won't have a problem, but throw an iin the search string and run it in Turkey and it will probably ruin your day.
在这种特殊情况下,您可能不会遇到问题,但是i在搜索字符串中输入一个并在土耳其运行它,它可能会毁了您的一天。
See MSDN: http://msdn.microsoft.com/en-us/library/ms973919.aspx
请参阅 MSDN:http: //msdn.microsoft.com/en-us/library/ms973919.aspx
These new recommendations and APIs exist to alleviate misguided assumptions about the behavior of default string APIs. The canonical example of bugs emerging where non-linguistic string data is interpreted linguistically is the "Turkish-I" problem.
For nearly all Latin alphabets, including U.S. English, the character i (\u0069) is the lowercase version of the character I (\u0049). This casing rule quickly becomes the default for someone programming in such a culture. However, in Turkish ("tr-TR"), there exists a capital "i with a dot," character (\u0130), which is the capital version of i. Similarly, in Turkish, there is a lowercase "i without a dot," or (\u0131), which capitalizes to I. This behavior occurs in the Azeri culture ("az") as well.
Therefore, assumptions normally made about capitalizing i or lowercasing I are not valid among all cultures. If the default overloads for string comparison routines are used, they will be subject to variance between cultures. For non-linguistic data, as in the following example, this can produce undesired results:
这些新的建议和 API 的存在是为了减轻对默认字符串 API 行为的误导假设。在语言上解释非语言字符串数据时出现的错误的典型示例是“Turkish-I”问题。
对于几乎所有拉丁字母,包括美国英语,字符 i (\u0069) 是字符 I (\u0049) 的小写版本。这种大小写规则很快成为在这种文化中编程的人的默认设置。但是,在土耳其语(“tr-TR”)中,存在大写的“带点的 i”字符(\u0130),这是 i 的大写版本。同样,在土耳其语中,有一个小写的“i 没有点”或 (\u0131),它大写为 I。这种行为也发生在阿塞拜疆文化 (“az”) 中。
因此,通常关于大写 i 或小写 I 的假设不适用于所有文化。如果使用字符串比较例程的默认重载,它们将受不同文化的影响。对于非语言数据,如下例所示,这可能会产生不希望的结果:
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US")
Console.WriteLine("Culture = {0}",
Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}",
(String.Compare("file", "FILE", true) == 0));
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
Console.WriteLine("Culture = {0}",
Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}",
(String.Compare("file", "FILE", true) == 0));
Because of the difference of the comparison of I, results of the comparisons change when the thread culture is changed. This is the output:
由于I的比较的不同,当线程文化改变时,比较的结果会发生变化。这是输出:
Culture = English (United States)
(file == FILE) = True
Culture = Turkish (Turkey)
(file == FILE) = False
Here is an example without case:
这是一个没有案例的例子:
var s1 = "é"; //é as one character (ALT+0233)
var s2 = "e?"; //'e', plus combining acute accent U+301 (two characters)
Console.WriteLine(s1.IndexOf(s2, StringComparison.Ordinal)); //-1
Console.WriteLine(s1.IndexOf(s2, StringComparison.InvariantCulture)); //0
Console.WriteLine(s1.IndexOf(s2, StringComparison.CurrentCulture)); //0
回答by m-y
CA1309: UseOrdinalStringComparison
CA1309:使用OrdinalStringComparison
It doesn't hurtto not use it, but "by explicitly setting the parameter to either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, increases correctness, and becomes more reliable.".
不使用它并没有什么坏处,但是“通过将参数显式设置为 StringComparison.Ordinal 或 StringComparison.OrdinalIgnoreCase,您的代码通常会提高速度、提高正确性并变得更可靠。”。
What exactly is Ordinal, and why does it matter to your case?
究竟什么是序数,为什么它对您的情况很重要?
An operation that uses ordinal sort rules performs a comparison based on the numeric value (Unicode code point) of each Char in the string. An ordinal comparison is fast but culture-insensitive. When you use ordinal sort rules to sort strings that start with Unicode characters (U+), the string U+xxxx comes before the string U+yyyy if the value of xxxx is numerically less than yyyy.
使用序数排序规则的操作根据字符串中每个 Char 的数值(Unicode 代码点)执行比较。序数比较很快,但对文化不敏感。当您使用序数排序规则对以 Unicode 字符 (U+) 开头的字符串进行排序时,如果 xxxx 的值在数字上小于 yyyy,则字符串 U+xxxx 位于字符串 U+yyyy 之前。
And, as you stated... the string value you are reading in is not culture sensitive, so it makes sense to use an Ordinal comparison as opposed to a Word comparison. Just remember, Ordinal means "this isn't culture sensitive".
而且,正如您所说......您正在阅读的字符串值不区分文化,因此使用序数比较而不是字比较是有意义的。请记住,Ordinal 的意思是“这对文化不敏感”。
回答by 500 - Internal Server Error
To answer your specific question: No, but a static analysis tool is not going to be able to realize that your input value will never have locale-specific information in it.
回答您的具体问题:不,但静态分析工具将无法意识到您的输入值中永远不会包含特定于区域设置的信息。

