string 大写与小写

Question

提问by Parappa

When doing case-insensitive comparisons, is it more efficient to convert the string to upper case or lower case? Does it even matter?

在进行不区分大小写的比较时，将字符串转换为大写还是小写更有效？它甚至重要吗？

It is suggested in this SO postthat C# is more efficient with ToUpper because "Microsoft optimized it that way." But I've also read this argumentthat converting ToLower vs. ToUpper depends on what your strings contain more of, and that typically strings contain more lower case characters which makes ToLower more efficient.

在这篇 SO 帖子中建议C# 使用 ToUpper 更高效，因为“微软以这种方式对其进行了优化”。但我也读过这个论点，即转换 ToLower 与 ToUpper 取决于您的字符串包含的更多内容，并且通常字符串包含更多小写字符，这使 ToLower 更高效。

In particular, I would like to know:

我特别想知道：

Is there a way to optimize ToUpper or ToLower such that one is faster than the other?
Is it faster to do a case-insensitive comparison between upper or lower case strings, and why?
Are there any programming environments (eg. C, C#, Python, whatever) where one case is clearly better than the other, and why?

有没有办法优化 ToUpper 或 ToLower 使一个比另一个更快？
在大写或小写字符串之间进行不区分大小写的比较是否更快，为什么？
是否有任何编程环境（例如 C、C#、Python 等等）其中一种情况明显优于另一种情况，为什么？

Answer 1

回答by Jon Skeet

Converting to either upper case or lower case in order to do case-insensitive comparisons is incorrect due to "interesting" features of some cultures, particularly Turkey. Instead, use a StringComparerwith the appropriate options.

由于某些文化（尤其是土耳其）的“有趣”特征，转换为大写或小写以进行不区分大小写的比较是不正确的。Instead, use a StringComparerwith the appropriate options.

MSDN has some great guidelineson string handling. You might also want to check that your code passes the Turkey test.

MSDN 有一些关于字符串处理的重要指南。您可能还想检查您的代码是否通过了土耳其测试。

EDIT: Note Neil's comment around ordinalcase-insensitive comparisons. This whole realm is pretty murky :(

编辑：注意 Neil 关于不区分大小写的序数比较的评论。整个领域都非常模糊:(

Answer 2

回答by Ian Boyd

From Microsofton MSDN:

来自微软在 MSDN 上：

Best Practices for Using Strings in the .NET Framework
Recommendations for String Usage
Use the String.ToUpperInvariantmethod instead of the String.ToLowerInvariantmethod when you normalize strings for comparison.

在 .NET Framework 中使用字符串的最佳实践
字符串使用建议
规范化字符串以进行比较时，请使用String.ToUpperInvariant方法而不是String.ToLowerInvariant方法。

Why? From Microsoft:

为什么？来自微软：

Normalize strings to uppercase
There is a small group of characters that when converted to lowercase cannot make a round trip.

将字符串规范化为大写
有一小组字符在转换为小写时不能往返。

What is example of such a character that cannot make a round trip?

这种不能往返的角色的例子是什么？

Start: Greek Rho Symbol (U+03f1) ϱ
Uppercase:Capital Greek Rho (U+03a1) Ρ
Lowercase:Small Greek Rho (U+03c1) ρ

开始：希腊 Rho 符号 (U+03f1) ϱ
大写：大写希腊字母 Rho (U+03a1) Ρ
小写：小希腊语 Rho (U+03c1) ρ

ϱ , Ρ, ρ

ε , Ρ, ρ

.NET Fiddle

.NET 小提琴

Original: ?
ToUpper: Ρ
ToLower: ρ

That is why, if your want to do case insensitive comparisons you convert the strings to uppercase, and not lowercase.

这就是为什么，如果您想进行不区分大小写的比较，请将字符串转换为大写而不是小写。

So if you have to choose one, choose Uppercase.

因此，如果您必须选择一个，请选择Uppercase。

Answer 3

回答by Rob Walker

According to MSDNit is more efficient to pass in the strings and tell the comparison to ignore case:

根据MSDN，传入字符串并告诉比较忽略大小写更有效：

String.Compare(strA, strB, StringComparison.OrdinalIgnoreCase) is equivalent to (but faster than) calling
String.Compare(ToUpperInvariant(strA), ToUpperInvariant(strB), StringComparison.Ordinal).
These comparisons are still very fast.

String.Compare(strA, strB, StringComparison.OrdinalIgnoreCase) 等价于（但比）调用
String.Compare(ToUpperInvariant(strA), ToUpperInvariant(strB), StringComparison.Ordinal)。
这些比较还是很快的。

Of course, if you are comparing one string over and over again then this may not hold.

当然，如果您一遍又一遍地比较一个字符串，那么这可能不成立。

Answer 4

回答by warren

Based on strings tending to have more lowercase entries, ToLower should theoretically be faster (lots of compares, but few assignments).

基于倾向于有更多小写条目的字符串，ToLower 理论上应该更快（大量比较，但很少分配）。

In C, or when using individually-accessible elements of each string (such as C strings or the STL's string type in C++), it's actually a byte comparison - so comparing UPPERis no different from lower.

在 C 中，或者当使用每个字符串的单独可访问元素（例如 C 字符串或 C++ 中的 STL 的字符串类型）时，它实际上是一个字节比较 - 因此比较UPPER与lower.

If you were sneaky and loaded your strings into longarrays instead, you'd get a very fast comparison on the whole string because it could compare 4 bytes at a time. However, the load time might make it not worthwhile.

如果你偷偷地把你的字符串加载到long数组中，你会得到对整个字符串的非常快速的比较，因为它一次可以比较 4 个字节。但是，加载时间可能使它不值得。

Why do you need to know which is faster? Unless you're doing a metric buttload of comparisons, one running a couple cycles faster is irrelevant to the speed of overall execution, and sounds like premature optimization :)

为什么你需要知道哪个更快？除非您进行大量比较，否则运行速度快几个周期与整体执行速度无关，并且听起来像是过早优化:)

Answer 5

回答by Dan Herbert

Microsoft has optimized ToUpperInvariant(), not ToUpper(). The difference is that invariant is more culture friendly. If you need to do case-insensitive comparisons on strings that may vary in culture, use Invariant, otherwise the performance of invariant conversion shouldn't matter.

微软已经优化了ToUpperInvariant()，没有ToUpper()。不同之处在于不变量对文化更友好。如果您需要对可能因区域性而异的字符串进行不区分大小写的比较，请使用 Invariant，否则不变转换的性能应该无关紧要。

I can't say whether ToUpper() or ToLower() is faster though. I've never tried it since I've never had a situation where performance mattered that much.

我不能说是 ToUpper() 还是 ToLower() 更快。我从未尝试过，因为我从未遇到过性能如此重要的情况。

Answer 6

回答by Jon Tackabury

If you are doing string comparison in C# it is significantly faster to use .Equals() instead of converting both strings to upper or lower case. Another big plus for using .Equals() is that more memory isn't allocated for the 2 new upper/lower case strings.

如果您在 C# 中进行字符串比较，使用 .Equals() 而不是将两个字符串转换为大写或小写要快得多。使用 .Equals() 的另一个好处是没有为 2 个新的大写/小写字符串分配更多内存。

Answer 7

回答by Adam Rosenfield

It really shouldn't ever matter. With ASCII characters, it definitely doesn't matter - it's just a few comparisons and a bit flip for either direction. Unicode might be a little more complicated, since there are some characters that change case in weird ways, but there really shouldn't be any difference unless your text is full of those special characters.

这真的不应该重要。对于 ASCII 字符，这绝对无关紧要 - 这只是一些比较和任一方向的一点翻转。Unicode 可能稍微复杂一些，因为有些字符会以奇怪的方式改变大小写，但除非您的文本充满这些特殊字符，否则真的不应该有任何区别。

Answer 8

回答by Clearer

Doing it right, there should be a small, insignificant speed advantage if you convert to lower case, but this is, as many has hinted, culture dependent and is not inherit in the function but in the strings you convert (lots of lower case letters means few assignments to memory) -- converting to upper case is faster if you have a string with lots of upper case letters.

做得对，如果您转换为小写字母，应该会有一个很小的、微不足道的速度优势，但正如许多人所暗示的那样，这取决于文化，并且不是在函数中继承，而是在您转换的字符串中继承（很多小写字母意味着对内存的分配很少） - 如果您有一个包含大量大写字母的字符串，则转换为大写会更快。

Answer 9

回答by Sanjaya R

It Depends. As stated above, plain only ASCII, its identical. In .NET, read about and use String.Compareits correct for the i18n stuff (languages cultures and unicode). If you know anything about likelyhood of the input, use the more common case.

这取决于。如上所述，纯ASCII，其相同。在 .NET 中，阅读并使用String.Compare其对 i18n 内容（语言文化和 unicode）的正确性。如果您对输入的可能性有任何了解，请使用更常见的情况。

Remember, if you are doing multiple string compares length is an excellent first discriminator.

请记住，如果您要进行多个字符串比较，则长度是一个很好的第一鉴别器。

Answer 10

回答by Brian Knoblauch

If you're dealing in pure ASCII, it doesn't matter. It's just an OR x,32 vs. an AND x,224. Unicode, I have no idea...

如果您正在处理纯 ASCII，那没关系。它只是一个 OR x,32 与一个 AND x,224。Unicode，我不知道...

string 大写与小写

提问by Parappa

回答by Jon Skeet

回答by Ian Boyd

Best Practices for Using Strings in the .NET Framework

在 .NET Framework 中使用字符串的最佳实践

Normalize strings to uppercase

将字符串规范化为大写

回答by Rob Walker

回答by warren

回答by Dan Herbert

回答by Jon Tackabury

回答by Adam Rosenfield

回答by Clearer

回答by Sanjaya R

回答by Brian Knoblauch

相关推荐

最近更新

标签

string 大写与小写

提问by Parappa

回答by Jon Skeet

回答by Ian Boyd

Best Practices for Using Strings in the .NET Framework

在 .NET Framework 中使用字符串的最佳实践

Normalize strings to uppercase

将字符串规范化为大写

回答by Rob Walker

回答by warren

回答by Dan Herbert

回答by Jon Tackabury

回答by Adam Rosenfield

回答by Clearer

回答by Sanjaya R

回答by Brian Knoblauch

相关推荐

pandas 没有行名（索引）从熊猫导出到_excel？

如何按“pandas”中的列获取缺失/NaN 数据的汇总计数？

像 SQL 的 LIKE 一样匹配 Pandas 文本？

在 Pandas 中为 to_csv() 设置 File_Path

相关推荐

最近更新

标签